Transferrin receptor genes of Moraxella

ABSTRACT

Purified and isolated nucleic acid molecules are provided which encode transferrin receptor proteins of Moraxella, such as  M. catarrhalis  or a fragment or an analog of the transferrin receptor protein. The nucleic acid sequence may be used to produce recombinant transferrin receptor proteins Tbp1 and Tbp2 of the strain of Moraxella free of other proteins of the Moraxella strain for purposes of diagnostics and medical treatment. Furthermore, the nucleic acid molecule may be used in the diagnosis of infection.

REFERENCE TO RELATED APPLICATION

This application is a continuation-in-part of International PatentApplication No. PCT/CA97/00163 filed Mar. 7, 1997 which itself is acontinuation-in-part of copending U.S. patent application Ser. No.08/778,570 filed Jan. 3, 1997, which itself is a continuation-in-part ofU.S. patent application Ser. No. 08/613,009 filed Mar. 8, 1996.

FIELD OF INVENTION

The present invention relates to the molecular cloning of genes encodingtransferrin receptor (TfR) proteins and, in particular, to the cloningof transferrin receptor genes from Moraxella (Branhamella) catarrhalis.

BACKGROUND OF THE INVENTION

Moraxella (Branhamella) catarrhalis bacteria are Gram-negativediplococcal pathogens which are carried asymptomatically in the healthyhuman respiratory tract. In recent years, M. catarrhalis has beenrecognized as an important causative agent of otitis media. In addition,M. catarrhalis has been associated with sinusitis, conjunctivitis, andurogenital infections, as well as with a number of inflammatory diseasesof the lower respiratory tract in children and adults, includingpneumonia, chronic bronchitis, tracheitis, and emphysema (refs. 1 to 8).(Throughout this application, various references are cited inparentheses to describe more fully the state of the art to which thisinvention pertains. Full bibliographic information for each citation isfound at the end of the specification, immediately preceding the claims.The disclosures of these references are hereby incorporated by referenceinto the present disclosure). Occasionally, M. catarrhalis invades tocause septicaemia, arthritis, endocarditis, and meningitis (refs. 9 to13).

Otitis media is one of the most common illnesses of early childhood andapproximately 80% of all children suffer at least one middle earinfection before the age of three (ref. 14). Chronic otitis media hasbeen associated with auditory and speech impairment in children, and insome cases, has been associated with learning disabilities. Conventionaltreatments for otitis media include antibiotic administration andsurgical procedures, including tonsillectomies, adenoidectomies, andtympanocentesis. In the United States, treatment costs for otitis mediaare estimated to be between one and two billion dollars per year.

In otitis media cases, M. catarrhalis commonly is co-isolated frommiddle ear fluid along with Streptococcus pneumoniae and non-typableHaemophilus influenzae, which are believed to be responsible for 50% and30% of otitis media infections, respectively. M. catarrhalis is believedto be responsible for approximately 20% of otitis media infections (ref.15). Epidemiological reports indicate that the number of cases of otitismedia attributable to M. catarrhalis is increasing, along with thenumber of antibiotic-resistant isolates of M. catarrhalis. Thus, priorto 1970, no β-lactamase-producing M. catarrhalis isolates had beenreported, but since the mid-seventies, an increasing number ofβ-lactamase-expressing isolates have been detected. Recent surveyssuggest that 75% of clinical isolates produce β-lactamase (ref. 16, 26).

Iron is an essential nutrient for the growth of many bacteria. Severalbacterial species, including M. catarrhalis, obtain iron from the hostby using transferrin receptor proteins to capture transferrin. A numberof bacteria including Neisseria meningitidis (ref. 17), N. gonorrhoeae(ref. 18), Haemophilus influenzae (ref. 19), as well as M. catarrhalis(ref. 20), produce outer membrane proteins which specifically bind humantransferrin. The expression of these proteins is regulated by the amountof iron in the environment.

The two transferrin receptor proteins of M. catarrhalis, designatedtransferrin binding protein 1 (Tbp1) and transferrin binding protein 2(Tbp2), have molecular weights of 115 kDa (Tbp1) and approximately 80 to90 kDa (Tbp2). Unlike the transferrin receptor proteins of otherbacteria which have an affinity for apotransferrin, the M. catarrhalisTbp2 receptors have a preferred affinity for iron-saturated (i.e.,ferri-) transferrin (ref. 21).

M. catarrhalis infection may lead to serious disease. It would beadvantageous to provide a recombinant source of transferrin bindingproteins as antigens in immunogenic preparations including vaccines,carriers for other antigens and immunogens and the generation ofdiagnostic reagents. The genes encoding transferrin binding proteins andfragments thereof are particularly desirable and useful in the specificidentification and diagnosis of Moraxella and for immunization againstdisease caused by M. catarrhalis and for the generation of diagnosticreagents.

SUMMARY OF THE INVENTION

The present invention is directed towards the provision of purified andisolated nucleic acid molecules encoding a transferrin receptor of astrain of Moraxella or a fragment or an analog of the transferrinreceptor protein. The nucleic acid molecules provided herein are usefulfor the specific detection of strains of Moraxella and for diagnosis ofinfection by Moraxella. The purified and isolated nucleic acid moleculesprovided herein, such as DNA, are also useful for expressing the tbpgenes by recombinant DNA means for providing, in an economical manner,purified and isolated transferrin receptor proteins as well as subunits,fragments or analogs thereof. The transferrin receptor, subunits orfragments thereof or analogs thereof, as well as nucleic acid moleculesencoding the same and vectors containing such nucleic acid molecules,are useful in immunogenic compositions for vaccinating against diseasescaused by Moraxella, the diagnosis of infection by Moraxella and astools for the generation of immunological reagents. Monoclonalantibodies or mono-specific antisera (antibodies) raised against thetransferrin receptor protein, produced in accordance with aspects of thepresent invention, are useful for the diagnosis of infection byMoraxella, the specific detection of Moraxella (in, for example, invitro and in vivo assays) and for the treatment of diseases caused byMoraxella.

In accordance with one aspect of the present invention, there isprovided a purified and isolated nucleic acid molecule encoding atransferrin receptor protein of a strain of Moraxella, moreparticularly, a strain of M. catarrhalis, specifically M. catarrhalisstrain 4223, Q8, R1, M35, 3 or LES1, or a fragment or an analog of thetransferrin receptor protein.

In one preferred embodiment of the invention, the nucleic acid moleculemay encode only the Tbp1 protein of the Moraxella strain or only theTbp2 protein of the Moraxella strain. In another preferred embodiment ofthe invention, the nucleic acid may encode a fragment of the transferrinreceptor protein of a strain of Moraxella having an amino acid sequencewhich is conserved.

In another aspect of the present invention, there is provided a purifiedand isolated nucleic acid molecule having a DNA sequence selected fromthe group consisting of (a) a DNA sequence as set out in FIG. 5, 6, 10,11, 27, 31, 32 or 33 (SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 45, 47, 48, 50or 52 or the complementary DNA sequence thereto; (b) a DNA sequenceencoding an amino acid sequence as set out in FIG. 5, 6, 10, 11, 27, 31,32 or 33 (SEQ ID NOS: 9, 10, 11, 12, 13, 14, 15, 16, 46, 49, 51 or 53 orthe complementary DNA sequence thereto; and (c) a DNA sequence encodinga functional transferrin receptor protein of a strain of Moraxella,which may be a DNA sequence which hybridizes under stringent conditionsto any one of the DNA sequences defined in (a) or (b). The DNA sequencedefined in (c) may have at least about 90% sequence identity with anyone of the DNA sequences defined in (a) and (b). The functionaltransferrin receptor protein of a strain of Moraxella encoded by the DNAsequence defined in (c) is the equivalent transferrin receptor proteinfrom another strain of Moraxella.

In an additional aspect, the present invention includes a vector adaptedfor transformation of a host, comprising a nucleic acid molecule asprovided herein and may have the characteristics of a nucleotidesequence contained within vectors LEM3-24, pLEM3, pLEM25, pLEM23,SLRD-A, DS-1698-1-1, DS-1754-1, pSLRD2, pSLRD3, pSLRD4 and pSLRD5.

The vector may be adapted for expression of the encoded transferrinreceptor, fragments or analogs thereof, in a heterologous or homologoushost, in either a lipidated or non-lipidated form. Accordingly, afurther aspect of the present invention provides an expression vectoradapted for transformation of a host comprising a nucleic acid moleculeas provided herein and expression means operatively coupled to thenucleic acid molecule for expression by the host of the transferrinreceptor protein or the fragment or analog of the transferrin receptorprotein. In specific embodiments of this aspect of the invention, thenucleic acid molecule may encode substantially all the transferrinreceptor protein, only the Tbp1 protein, only the Tbp2 protein of theMoraxella strain or fragments of the Tbp1 or Tbp2 proteins. Theexpression means may include a promoter and a nucleic acid portionencoding a leader sequence for secretion from the host of thetransferrin receptor protein or the fragment or the analog of thetransferrin receptor protein. The expression means also may include anucleic acid portion encoding a lipidation signal for expression fromthe host of a lipidated form of the transferrin receptor protein or thefragment or the analog of the transferrin receptor protein. The host maybe selected from, for example, Escherichia coli, Bordetella, Bacillus,Haemophilus, Moraxella, fungi, yeast or baculovirus and Semliki Forestvirus expression systems may be used. In a particular embodiment, theplasmid adapted for expression of Tbp1 is pLEM29 and that for expressionof Tbp2 is pLEM33. Further vectors include pLEM-37, SLRD35-A andSLRD-35-B.

In an additional aspect of the invention, there is provided atransformed host containing an expression vector as provided herein. Theinvention further includes a recombinant transferrin receptor protein orfragment or analog thereof of a strain of Moraxella producible by thetransformed host.

Such recombinant transferrin receptor protein may be provided insubstantially pure form according to a further aspect of the invention,which provides a method of forming a substantially pure recombinanttransferrin receptor protein, which comprises growing the transformedhost provided herein to express a transferrin receptor protein asinclusion bodies, purifying the inclusion bodies free from cellularmaterial and soluble proteins, solubilizing transferrin receptor proteinfrom the purified inclusion bodies, and purifying the transferrinreceptor protein free from other solubilized materials. Thesubstantially pure recombinant transferrin receptor protein may compriseTbp1 alone, Tbp2 alone or a mixture thereof. The recombinant protein isgenerally at least about 70% pure, preferably at least about 90% pure.

Further aspects of the present invention, therefore, providerecombinantly-produced Tbp1 protein of a strain of Moraxella devoid ofthe Tbp2 protein of the Moraxella strain and any other protein of theMoraxella strain and recombinantly-produced Tbp2 protein of a strain ofMoraxella devoid of the Tbp1 protein of the Moraxella strain and anyother protein of the Moraxella strain. The Moraxella strain may be M.catarrhalis 4223 strain, M. catarrhalis Q8 strain or M. catarrhalis R1strain, M. catarrhalis M35 strain, M. catarrhalis 3 strain or M.catarrhalis LES1 strain.

In accordance with another aspect of the invention, an immunogeniccomposition is provided which comprises at least one active componentselected from at least one nucleic acid molecule as provided herein andat least one recombinant protein as provided herein, and apharmaceutically acceptable carrier therefor or vector therefor. The atleast one active component produces an immune response when administeredto a host.

The immunogenic compositions provided herein may be formulated asvaccines for in vivo administration to a host. For such purpose, thecompositions may be formulated as a microparticle, capsule, ISCOM(immunostimulatory complex) or liposome preparation. The immunogeniccomposition may be provided in combination with a targeting molecule fordelivery to specific cells of the immune system or to mucosal surfaces.The immunogenic compositions of the invention (including vaccines) mayfurther comprise at least one other immunogenic or immunostimulatingmaterial and the immunostimulating material may be at least one adjuvantor at least one cytokine. Suitable adjuvants for use in the presentinvention include (but are not limited to) aluminum phosphate, aluminumhydroxide, QS21, Quil A, derivatives and components thereof, ISCOMmatrix, calcium phosphate, calcium hydroxide, zinc hydroxide, aglycolipid analog, an octadecyl ester of an amino acid, a muramyldipeptide, polyphosphazene, ISCOPREP, DC-chol, DDBA and a lipoprotein.Advantageous combinations of adjuvants are described in copending U.S.patent applications Ser. Nos. 08/261,194 filed Jun. 16, 1994 and Ser.No. 08/483,856, filed Jun. 7, 1995, assigned to the assignee hereof andthe disclosures of which are incorporated herein by reference thereto(WO 95/34308).

In accordance with another aspect of the invention, there is provided amethod for generating an immune response in a host, comprising the stepof administering to a susceptible host, such as a human, an effectiveamount of the immunogenic composition provided herein. The immuneresponse may be a humoral or a cell-mediated immune response and mayprovide protection against disease caused by Moraxella. Hosts in whichprotection against disease may be conferred include primates, includinghumans.

In a further aspect, there is provided a live vector for delivery oftransferrin receptor to a host, comprising a vector containing thenucleic acid molecule as described above. The vector may be selectedfrom Salmonella, BCG, adenovirus, poxvirus, vaccinia and poliovirus.

The nucleic acid molecules provided herein are useful in diagnosticapplications. Accordingly, in a further aspect of the invention, thereis provided a method of determining the presence, in a sample, ofnucleic acid encoding a transferrin receptor protein of a strain ofMoraxella, comprising the steps of:

(a) contacting the sample with a nucleic acid molecule as providedherein to produce duplexes comprising the nucleic acid molecule and anynucleic acid molecule encoding the transferrin receptor protein of astrain of Moraxella present in the sample and specifically hybridizabletherewith; and

(b) determining the production of the duplexes.

In addition, the present invention provides a diagnostic kit fordetermining the presence, in a sample, of nucleic acid encoding atransferrin receptor protein of a strain of Moraxella, comprising:

(a) a nucleic acid molecule as provided herein;

(b) means for contacting the nucleic acid molecule with the sample toproduce duplexes comprising the nucleic acid molecule and any suchnucleic acid present in the sample and hybridizable with the nucleicacid molecule; and

(c) means for determining production of the duplexes.

The invention further includes the use of the nucleic acid molecules andproteins provided herein as medicines. The invention additionallyincludes the use of the nucleic acid molecules and proteins providedherein in the manufacture of medicaments for protection againstinfection by strains of Moraxella.

Advantages of the present invention include:

an isolated and purified nucleic acid molecule encoding a transferrinreceptor protein of a strain of Moraxella or a fragment or an analog ofthe transferrin receptor protein;

recombinantly-produced transferrin receptor proteins, including Tbp1 andTbp2, free from each other and other Moraxella proteins; and

diagnostic kits and immunological reagents for specific identificationof Moraxella.

BRIEF DESCRIPTION OF DRAWINGS

The present invention will be further understood from the followingdescription with reference to the drawings, in which:

FIG. 1 shows the amino acid sequences (SEQ ID NOS: 17 and 18) of aconserved portion of Tbp1 proteins used for synthesis of degenerateprimers used for PCR amplification of a portion of the M. catarrhalis4223 tbpA gene;

FIG. 2 shows a restriction map of clone LEM3-24 containing the tbpA andtbpB genes and orf 3 gene from M. catarrhalis isolate 4223;

FIG. 3 shows a restriction map of the tbpA gene for M. catarrhalis 4223;

FIG. 4 shows a restriction map of the tbpB gene for M. catarrhalis 4223;

FIGS. 5A to 5J show the nucleotide sequence of the tbpA gene (SEQ ID NO:1—entire sequence and SEQ ID NO: 2—coding sequence) and the deducedamino acid sequence of the Tbp1 protein from M. catarrhalis 4223 (SEQ IDNO: 9—full length and SEQ ID NO: 10—mature protein). The leader sequence(SEQ ID NO: 19) is shown by underlining;

FIGS. 6A to 6G show the nucleotide sequence of the tbpB gene (SEQ ID NO:3—entire sequence and SEQ ID NO: 4—coding sequence) and the deducedamino acid sequence of the Tbp2 protein from M. catarrhalis 4223 (SEQ IDNOS: 11—full length and SEQ ID NO: 12—mature protein). The leadersequence (SEQ ID NO: 20) is shown by underlining;

FIG. 7 shows a restriction map of clone SLRD-A containing the tbpA andtbpB genes and orf 3 gene from M. catarrhalis Q8;

FIG. 8 shows a restriction map of the tbpA gene from M. catarrhalis Q8;

FIG. 9 shows a restriction map of the tbpB gene from M. catarrhalis Q8;

FIGS. 10A to 10Q show the nucleotide sequence of the tbpA gene (SEQ. IDNO: 5—entire sequence and SEQ ID NO: 6—coding sequence) and the deducedamino acid sequence of the Tbp1 protein from M. catarrhalis Q8 (SEQ IDNO: 13—full length and SEQ ID NO: 14—mature protein);

FIGS. 11A to 11O show the nucleotide sequence of the tbpB gene (SEQ. IDNO: 7—entire sequence and SEQ ID NO: 8—coding sequence) and the deducedamino acid sequence of the Tbp2 protein from M. catarrhalis Q8 (SEQ IDNO: 15—full length and SEQ ID NO: 16—mature protein);

FIGS. 12A to 12G show a comparison of the amino acid sequences of Tbp1from M. catarrhalis strain 4223 (SEQ ID NO: 9) and Q8 (SEQ ID NO: 13),H. influenzae strain Eagan (SEQ ID NO: 21), N. meningitidis strainsB16B6 (SEQ ID NO: 22) and M982 (SEQ ID NO: 23), and N. gonorrhoeaestrain FA19 (SEQ ID NO: 24). Dots indicate identical residues and dasheshave been inserted for maximum alignment;

FIGS. 13A to 13F show a comparison of the amino acid sequences of Tbp2from M. catarrhalis isolate 4223 (SEQ ID NO: 11) and Q8 (SEQ ID NO: 15),H. influenzae strain Eagan (SEQ ID NO: 25), N. meningitidis strainsB16B6 (SEQ ID NO: 26) and M918 (SEQ ID NO: 27), and N. gonorrhoeaestrain FA19 (SEQ ID NO: 28). Dots indicate identical residues and dasheshave been inserted for maximum alignment;

FIGS. 14A and 14B show the construction of plasmid pLEM29 for expressionof recombinant Tbp1 protein from E. coli;

FIG. 15 shows an SDS-PAGE analysis of the expression of Tbp1 protein byE. coli cells transformed with plasmid pLEM29;

FIG. 16 shows a flow chart for purification of recombinant Tbp1 protein;

FIG. 17 shows an SDS-PAGE analysis of purified recombinant Tbp1 protein;

FIGS. 18A and 18B show the construction of plasmid pLEM33 and pLEM37 forexpression of TbpA gene from M. catarrhalis 4223 in E. coli without andwith a leader sequence respectively;

FIG. 19 shows an SDS-PAGE analysis of the expression of rTbp2 protein byE. coli cells transformed with plasmid pLEM37;

FIGS. 20A and 20B show the construction of plasmid sLRD35B forexpression of the tbpB gene from M. catarrhalis Q8 in E. coli without aleader sequence, and the construction of plasmid SLRD35A for expressionof the tbpB gene from M. catarrhalis Q8 in E. coli with a leadersequence. Restriction site B=BamHI; Bg=Bgl II; H=Hind III; R=EcoRI;

FIG. 21 shows SDS PAGE analysis of the expression of rTbp2 protein in E.coli cells, transformed with plasmids SLRD35A and SLRD35B;

FIG. 22 shows a flow chart for purification of recombinant Tbp2 proteinfrom E. coli;

FIG. 23, which includes Panels A and B, shows an SDS-PAGE analysis ofthe purification of recombinant Tbp2 protein from M. catarrhalis strains4223 (Panel A) and Q8 (Panel B) from expression in E. coli;

FIG. 24 shows the binding of Tbp2 to human transferrin;

FIG. 25, which includes Panels A, B and C, shows the antigenicconservation of Tbp2 protein amongst strains of M. catarrhalis;

FIG. 26 shows a partial restriction map of the M. catarrhalis strain M35tbpB gene;

FIGS. 27A to 27K show the nucleotide sequence of the tbpB gene (SEQ IDNO: 45) and deduced amino acid sequence of the Tbp2 protein of M.catarrhalis strain M35 (SEQ ID NO: 46);

FIG. 28 shows a restriction map of the tbpB gene for M. catarrhalis R1;

FIG. 29 shows a partial restriction map of the tbpB gene for M.catarrhalis strain 3;

FIG. 30 shows a partial restriction map of the tbpB genes for M.catarrhalis strain LES1;

FIGS. 31A to 31G show the nucleotide sequence of the tbpB gene (SEQ IDNO: 47—entire sequence and SEQ ID NO: 48—coding sequence) and thededuced amino acid sequence of the Tbp2 protein of M. catarrhalis strainR1 (SEQ ID NO: 49);

FIGS. 32A to 32K show the nucleotide sequence of tbpB gene (SEQ ID NO:50) and the deduced amino acid sequence of the Tbp2 protein of M.catarrhalis strain 3 (SEQ ID NO: 51);

FIGS. 33A to 33K show the nucleotide sequence of the tbpB gene (SEQ IDNO: 52) and deduced amino acid sequence of the Tbp2 M. catarrhalisstrain LES1 (SEQ ID NO: 53);

FIGS. 34A to 34D show an alignment of the Tbp2 proteins from strains4223 (SEQ ID NO: 11), R1 (SEQ ID NO: 49), M35 (SEQ ID NO: 46), LES1 (SEQID NO: 53), Q8 (SEQ ID NO: 15) and 3 (SEQ ID NO: 51). Dots indicateidentical residues and spaces have been introduced to maximize thesequence alignment. Underlining indicates those sequences conservedamongst the M. catarrhalis Tbp2 proteins and those from A.pleuropneumoniae, H. influenzae, N. gonorrhoeae, N. meningitidis and P.haemolytica;

FIGS. 35A to 35M′ show the nucleotide and deduced amino acid sequencesof the M. catarrhalis strain 4223 tbpA-orf3-tbpB gene locus (SEQ ID NO:54, nucleotide sequence of intergenic region, SEQ ID NO: 55, orf3 codingsequence; SEQ ID NO: 56, ORF3 amino acid sequence); and

FIG. 36 shows an alignment of the ORF3 proteins from M. catarrhalisstrain 4223 (SEQ ID NO: 56) and Q8 (SEQ ID NO: 57). Dots indicateidentical residues.

GENERAL DESCRIPTION OF THE INVENTION

Any Moraxella strain may be conveniently used to provide the purifiedand isolated nucleic acid, which may be in the form of DNA molecules,comprising at least a portion of the nucleic acid coding for atransferrin receptor as typified by embodiments of the presentinvention. Such strains are generally available from clinical sourcesand from bacterial culture collections, such as the American TypeCulture Collection. Strains 4223, LES-1 and M35 are all derived frompatients with otitis media while strains 3, R1 and Q8 were from spectrumor bronchial secretion.

In this application, the terms “transferrin receptor” (TfR) and“transferrin binding proteins” (Tbp) are used to define a family of Tbp1and/or Tbp2 proteins which includes those having variations in theiramino acid sequences including those naturally occurring in variousstrains of, for example, Moraxella. The purified and isolated DNAmolecules comprising at least a portion coding for transferrin receptorof the present invention also include those encoding functional analogsof transferrin receptor proteins Tbp1 and Tbp2 of Moraxella. In thisapplication, a first protein is a “functional analog” of a secondprotein if the first protein is immunologically related to and/or hasthe same function as the second protein. The functional analog may be,for example, a fragment of the protein, or a substitution, addition ordeletion mutant thereof.

Chromosomal DNA from M. catarrhalis 4223, a clinical isolate provided byDr. T. Murphy (State University of New York, Buffalo, N.Y.), wasdigested with Sau3A in order to generate fragments within a 15 to 23 kbsize range, and cloned into the BamHI site of the lambda vector EMBL3.The library was screened with anti-Tbp1 guinea pig antisera, and apositive clone LEM3-24, containing an insert approximately 13.2 kb insize was selected for further analysis. Lysate from E. coli LE392infected with LEM3-24 was found to contain a protein approximately 115kDa in size, which reacted on Western blots with anti-Tbp1 antisera. Asecond protein, approximately 80 kDa in size, reacted with the anti-Tbp2guinea pig antisera on Western blots.

In order to localize the tbpA gene on the 13.2 kb insert of LEM3-24,degenerate PCR primers were used to amplify a small region of theputative tbpA gene of M. catarrhalis 4223. The sequences of thedegenerate oligonucleotide primers were based upon conserved amino acidsequences within the Tbp1 proteins of several Neisseria and Haemophilusspecies and are shown in FIG. 1 (SEQ ID NOS: 17 and 18). A 300 base-pairamplified product was generated and its location within the 4223 tbpAgene is indicated by bold letters in FIG. 5 (SEQ ID NO: 29). Theamplified product was subcloned into the vector pCRII, labelled, andused to probe a Southern blot containing restriction-endonucleasedigested clone LEM3-24 DNA. The probe hybridized to a 3.8 kbHindIII-HindIII, a 2.0 kb AvrII-AvrII, and 4.2 kb SalI-SphI fragments(FIG. 2).

The 3.8 kb HindIII-HindIII fragment was subcloned into pACYC177, andsequenced. A large open reading frame was identified, and subsequentlyfound to contain approximately 2 kb of the putative tbpA gene. Theremaining 1 kb of the tbpA gene was obtained by. subcloning an adjacentdownstream HindIII-HindIII fragment into vector pACYC177. The nucleotidesequence of the tbpA gene from M. catarrhalis 4223 (SEQ ID NOS: 1 and2), and the deduced amino acid sequence (SEQ ID NO: 9—full length; SEQID NO: 10 mature protein) are shown in FIGS. 5A-5J.

Chromosomal DNA from M. catarrhalis strain Q8 was digested with Sau3A Iand 15-23 kb fragments were ligated with BamHI arms of EMBL3. (Strain Q8was a gift from Dr. M. G. Bergeron, Centre Hospitalier de l'UniversitéLaval, St. Foy, Quebec.) A high titre library was generated in E. coliLE392 cells and was screened using oligonucleotide probes based on the4223 tbpA sequence. Phage DNA was prepared and restriction enzymeanalysis revealed that inserts of about 13-15 kb had been cloned. Phageclone SLRD-A was used to subclone fragments for sequence analysis. Acloning vector (pSKMA) was generated to facilitate cloning of thefragments and plasmids pSLRD1, pSLRD2, pSLRD3, pSLRD4 and pSLRD5 weregenerated which contain all of tbpA and most of tbpB. The nucleotide(SEQ ID NOS: 5 and 6) and deduced amino acid sequence (SEQ ID NO:13—full length, SEQ ID NO: 14—mature protein) of the tbpA gene fromstrain Q8 are shown in FIGS. 10A to 10Q.

The deduced amino acid sequences for the Tbp1 protein encoded by thetbpA genes were found to share some homology with the amino acidsequences encoded by genes from a number of Neisseria and Haemophilusspecies (FIGS. 12A to 12G; SEQ ID NOS: 21, 22, 23 and 24).

Prior to the present discovery, tbpA genes identified in species ofNeisseria, Haemophilus, and Actinobacillus have been found to bepreceded by a tbpB gene with several conserved regions. The two genestypically are separated by a short intergenic sequence. However, a tbpBgene was not found upstream of the tbpA gene in M. catarrhalis 4223. Inorder to localize the tbpB gene within the 13.2 kb insert of cloneLEM3-24, a denerate oligonucleotide probe was synthesized based upon anamino acid sequence EGGFYGP (SEQ ID NO: 30), conserved among Tbp2proteins of several species. The oligonucleotide was labelled and usedto probe a Southern blot containing different restriction endonucleasefragments of clone LEM3-24. The probe hybridized to a 5.5 kb NheI-SalIfragment, which subsequently was subcloned into pBR328, and sequenced.The fragment contained most of the putative tbpB gene, with theexception of the promoter region. The clone LEM3-24 was sequenced toobtain the remaining upstream sequence. The tbpB gene was locatedapproximately 3 kb downstream from the end of the tbpA gene, in contrastto the genetic organization of the tbpA and tbpB genes in Haemophilusand Neisseria. The nucleotide sequence (SEQ ID NOS: 3 and 4) of the tbpBgene from M. catarrhalis 4223 and the deduced amino acid sequence (SEQID NOS: 11, 12) are shown in FIGS. 6A to 6G.

The tbpB gene from M. catarrhalis Q8 was also cloned and sequenced. Thenucleotide sequence (SEQ ID NOS: 7 and 8) and the deduced amino acidsequence (SEQ ID NOS: 15 and 16) are shown in FIGS. 11A to 11O.

The tbpB gene from M. catarrhalis R1, 3, M35 and LES1 were also clonedand sequenced. (Strain 3 is an isolate provided by Dr. T. Murphy; strainR1 was a gift from Dr. M. G. Bergeron; strain M35 was obtained from Dr.G. D. Campbell (Louisiana State University, Shreveport, La.) and strainLES1 was obtained from Dr. L. Stanfors (University of Tromso,Finland).). FIGS. 27A to 27K, 31A to 31G, 32A to 32K and 33A to 33K showthe nucleotide sequence of the tbpB gene (SEQ ID NOS: 45, 47, 48, 50,52) and deduced amino acid sequence of the Tbp2 protein (SEQ ID NOS: 46,49, 51, 53) of the M. catarrhalis strains M35, R1, 3 and LES1respectively. Regions of homology are evident between the M. catarrhalisTbp2 amino acid sequences as shown in the comparative alignment of FIGS.34A to 34D (SEQ ID NOS: 11, 15, 46, 49, 51 and 53)) and between the M.catarrhalis Tbp2 amino acid sequences and the Tbp2 sequences of a numberof Neisseria and Haemophilus species, as shown in the comparativealignment in FIGS. 13A to 13F (SEQ ID NOS: 25, 26, 27, 28). Underliningin FIGS. 34A to 34D indicates those sequences which are conserved amongthe M. catarrhalis Tbp2 proteins and those of A. pleuropneumoniae, H.influenzae, N. gonorrhoeae, N. meningitidis and P. haemolytica.

Cloned tbpA and tbpB genes were expressed in E. coli to producerecombinant Tbp1 and Tbp2 proteins free of other Moraxella proteins.These recombinant proteins were purified and used for immunization.

The antigenic conservation of Tbp2 protein amongst strains of M.catarrhalis was demonstrated by separation of the proteins in whole celllysates of M. catarrhalis or strains of E. coli expressing recombinantTbp2 proteins by SDS PAGE and antiserum immunoblotting with anti-4223rTbp2 antiserum or anti-Q8 rTbp2 antiserum raised in guinea pigs. M.catarrhalis strains 3, 56, 135, 585, 4223, 5191, 8185 and ATCC 25240were tested in this way and all showed specific reactivity withanti-4223 rTbp2 or anti-Q8 rTbp2 antibody (FIG. 25).

Sequence analysis indicated that at least two families could beidentified for M. catarrhalis tbpB genes, one comprising strains 4223,R1 and M35 and other containing strains Q8 and 3, with strain LES1 beingequally related to both families. Anti-rTbp2 bactericidal antibodyactivity (Table 4) correlated with the putative gene families identifiedby sequencing.

In addition, the ability of anti-rTbp2 antibodies from one strain torecognize native or recombinant protein from the homologous orheterologous strain by ELISA is shown in Table 1 below.

Amino acid sequencing of the N-termini and cyanogen bromide fragments oftransferrin receptor from M. catarrhalis 4223 was undertaken. BothN-termini of Tbp1 and Tbp2 were blocked. The putative signal sequencesof Tbp1 and Tbp2 are indicated by underlining in FIGS. 5A to 5J and 6Ato 6G (SEQ ID NOS: 19 and 20) respectively. The deduced amino acidsequences for the N-terminal region of Tbp2 suggests a lipoproteinstructure.

Results shown in Tables 1 and 2 below illustrate the ability ofanti-Tbp1 and anti-Tbp2 guinea pig antisera, produced by theimmunization with Tbp1 or Tbp2, to lyze M. catarrhalis. The results showthat the antisera produced by immunization with Tbp1 or Tbp2 proteinisolated from M. catarrhalis isolate 4223 were bactericidal against ahomologous non-clumping M. catarrhalis strain RH408 (a strain previouslydeposited in connection with U.S. patent application Ser. No.08/328,589, assigned to the assignee hereof, (WO 96/12733) with theAmerican Type Culture Collection, located at 1301 Parklawn Drive,Rockville, Md. 20852, USA under the terms of the Budapest Treaty on Dec.13, 1994 under ATCC Deposit No. 55,637) derived from isolate 4223. Inaddition, antisera produced by immunization with Tbp1 protein isolatedfrom M. catarrhalis 4223 were bactericidal against the heterologousnon-clumping strain Q8. In addition, antiserum raised againstrecombinant Tbp2 (rTbp2) protein was bacteriacidal against thehomologous strain of M. catarrhalis.

The ability of isolated and purified transferrin binding proteins togenerate bactericidal antibodies is in vivo evidence of utility of theseproteins as vaccines to protect against disease caused by Moraxella.

Thus, in accordance with another aspect of the present invention, thereis provided a vaccine against infection caused by Moraxella strains,comprising an immunogenically-effective amount of a transferrin bindingprotein from a strain of Moraxella and a physiologically-acceptablecarrier therefor. Vaccine preparations may comprise antigenically orsequence divergent transferrin binding proteins.

The transferrin binding protein provided herein is useful as adiagnostic reagent, as an antigen for the generation of anti-transferrinprotein binding antibodies, as an antigen for vaccination against thedisease caused by species of Moraxella and for detecting infection byMoraxella and other such bacteria.

The transferrin binding protein provided herein may also be used as acarrier protein for haptens, polysaccharides or peptides to makeconjugate vaccines against antigenic determinants unrelated totransferrin binding proteins. In additional embodiments of the presentinvention, therefore, the transferrin binding protein as provided hereinmay be used as a carrier molecule to prepare chimeric molecules andconjugate vaccines (including glycoconjugates) against pathogenicbacteria, including encapsulated bacteria. Thus, for example,glycoconjugates of the present invention may be used to conferprotection against disease and infection caused by any bacteria havingpolysaccharide antigens including lipooligosaccharides (LOS) and PRP.Such bacterial pathogens may include, for example, Haemophilusinfluenzae, Streptococcus pneumoniae, Escherichia coli, Neisseriameningitidis, Salmonella typhi, Streptococcus mutans, Cryptococcusneoformans, Klebsiella, Staphylococcus aureus and Pseudomonasaeruginosa. Particular antigens which can be conjugated to transferrinbinding protein and methods to achieve such conjugations are describedin U.S. patent application Ser. No. 08/433,522 filed Nov. 23, 1993 (WO94/12641), assigned to the assignee hereof and the disclosure of whichis hereby incorporated by reference thereto.

In another embodiment, the carrier function of transferrin bindingprotein may be used, for example, to induce an immune response againstabnormal polysaccharides of tumour cells, or to produce anti-tumourantibodies that can be conjugated to chemotherapeutic or bioactiveagents.

Additional sequence analysis of the entire M. catarrhalis strains 4223and Q8 tbpA-tbpB locus gene sequence (FIGS. 35A to 35M′) identified anintergenic open reading frame termed “orf3” (SEQ ID NO: 54, nucleotidesequence of intergenic region, SEQ ID NO: 55, orf3 coding sequence; SEQID NO: 56, ORF3 amino acid sequence), (see also FIGS. 2 and 7 forlocation of orf3). The encoded ORF3 proteins from 4223 and Q8 are 98%identical, as seen from the sequence alignment of FIG. 36 (SEQ ID NOS:56, 57).

The invention extends to transferrin binding proteins from Moraxellacatarrhalis for use as an active ingredient in a vaccine against diseasecaused by infection with Moraxella. The invention also extends to apharmaceutical vaccinal composition containing transferrin bindingproteins from Moraxella catarrhalis and optionally, a pharmaceuticallyacceptable carrier and/or diluent.

In a further aspect the invention provides the use of transferrinbinding proteins for the preparation of a pharmaceutical vaccinalcomposition for immunization against disease caused by infection withMoraxella.

It is clearly apparent to one skilled in the art, that the variousembodiments of the present invention have many applications in thefields of vaccination, diagnosis, treatment of, for example, Moraxellainfections and the generation of immunological and other diagnosticreagents. A further non-limiting discussion of such uses is furtherpresented below.

1. Vaccine Preparation and Use

Immunogenic compositions, suitable to be used as vaccines, may beprepared from immunogenic transferrin receptor proteins, analogs andfragments thereof encoded by the nucleic acid molecules as well as thenucleic acid molecules disclosed herein. The vaccine elicits an immuneresponse which produces antibodies, including anti-transferrin receptorantibodies and antibodies that are opsonizing or bactericidal. Shouldthe vaccinated subject be challenged by Moraxella, the antibodies bindto the transferrin receptor and thereby prevent access of the bacteriato an iron source which is required for viability. Furthermore,opsonizing or bactericidal anti-transferrin receptor antibodies may alsoprovide protection by alternative mechanisms.

Immunogenic compositions, including vaccines, may be prepared asinjectables, as liquid solutions or emulsions. The transferrin receptorproteins, analogs and fragments thereof and encoding nucleic acidmolecules may be mixed with pharmaceutically acceptable excipients whichare compatible with the transferrin receptor proteins, fragments,analogs or nucleic acid molecules. Such excipients may include water,saline, dextrose, glycerol, ethanol, and combinations thereof. Theimmunogenic compositions and vaccines may further contain auxiliarysubstances, such as wetting or emulsifying agents, pH buffering agents,or adjuvants, to enhance the effectiveness of the vaccines. Immunogeniccompositions and vaccines may be administered parenterally, by injectionsubcutaneously, intradermally or intramuscularly. Alternatively, theimmunogenic compositions provided according to the present invention,may be formulated and delivered in a manner to evoke an immune responseat mucosal surfaces. Thus, the immunogenic composition may beadministered to mucosal surfaces by, for example, the nasal or oral(intragastric) routes. The immunogenic composition may be provided incombination with a targeting molecule for delivery to specific cells ofthe immune system or to mucosal surfaces. Some such targeting moleculesinclude vitamin B12 and fragments of bacterial toxins, as described inWO 92/17167 (Biotech Australia Pty. Ltd.), and monoclonal antibodies, asdescribed in U.S. Pat. No. 5,194,254 (Barber et al). Alternatively,other modes of administration, including suppositories and oralformulations, may be desirable. For suppositories, binders and carriersmay include, for example, polyalkalene glycols or triglycerides. Oralformulations may include normally employed incipients such as, forexample, pharmaceutical grades of saccharine, cellulose and magnesiumcarbonate. These compositions may take the form of solutions,suspensions, tablets, pills, capsules, sustained release formulations orpowders and contain about 1 to 95% of the transferrin receptor proteins,fragments, analogs and/or nucleic acid molecules.

The vaccines are administered in a manner compatible with the dosageformulation, and in such amount as will be therapeutically effective,protective and immunogenic. The quantity to be administered depends onthe subject to be treated, including, for example, the capacity of theindividual's immune system to synthesize antibodies, and, if needed, toproduce a cell-mediated immune response. Precise amounts of activeingredient required to be administered depend on the judgment of thepractitioner. However, suitable dosage ranges are readily determinableby one skilled in the art and may be of the order of micrograms of thetransferrin receptor proteins, analogs and fragments thereof and/ornucleic acid molecules. Suitable regimes for initial administration andbooster doses are also variable, but may include an initialadministration followed by subsequent administrations. The dosage of thevaccine may also depend on the route of administration and will varyaccording to the size of the host.

The nucleic acid molecules encoding the transferrin receptor ofMoraxella may be used directly for immunization by administration of theDNA directly, for example, by injection for genetic immunization or byconstructing a live vector, such as Salmonella, BCG, adenovirus,poxvirus, vaccinia or poliovirus containing the nucleic acid molecules.A discussion of some live vectors that have been used to carryheterologous antigens to the immune system is contained in, for example,O'Hagan (ref 22). Processes for the direct injection of DNA into testsubjects for genetic immunization are described in, for example, Ulmeret al. (ref. 23).

Immunogenicity can be significantly improved if the antigens areco-administered with adjuvants, commonly used as an 0.05 to 1.0 percentsolution in phosphate-buffered saline. Adjuvants enhance theimmunogenicity of an antigen but are not necessarily immunogenicthemselves. Adjuvants may act by retaining the antigen locally near thesite of administration to produce a depot effect facilitating a slow,sustained release of antigen to cells of the immune system. Adjuvantscan also attract cells of the immune system to an antigen depot andstimulate such cells to elicit immune responses.

Immunostimulatory agents or adjuvants have been used for many years toimprove the host immune responses to, for example, vaccines. Intrinsicadjuvants, such as lipopolysaccharides, normally are the components ofkilled or attenuated bacteria used as vaccines. Extrinsic adjuvants areimmunomodulators which are typically non-covalently linked to antigensand are formulated to enhance the host immune responses. Thus, adjuvantshave been identified that enhance the immune response to antigensdelivered parenterally. Some of these adjuvants are toxic, however, andcan cause undesirable side-effects, making them unsuitable for use inhumans and many animals. Indeed, only aluminum hydroxide and aluminumphosphate (collectively commonly referred to as alum) are routinely usedas adjuvants in human and veterinary vaccines. The efficacy of alum inincreasing antibody responses to diphtheria and tetanus toxoids is wellestablished and an HBsAg vaccine has been adjuvanted with alum. Whilethe usefulness of alum is well established for some applications, it haslimitations. For example, alum is ineffective for influenza vaccinationand inconsistently elicits a cell mediated immune response. Theantibodies elicited by alum-adjuvanted antigens are mainly of the IgG1isotype in the mouse, which may not be optimal for protection by somevaccinal agents.

A wide range of extrinsic adjuvants can provoke potent immune responsesto antigens. These include saponins complexed to membrane proteinantigens (immune stimulating complexes), pluronic polymers with mineraloil, killed mycobacteria and mineral oil, Freund's complete adjuvant,bacterial products, such as muramyl dipeptide (MDP) andlipopolysaccharide (LPS), as well as lipid A, and liposomes.

To efficiently induce humoral immune responses (HIR) and cell-mediatedimmunity (CMI), immunogens are often emulsified in adjuvants. Manyadjuvants are toxic, inducing granulomas, acute and chronicinflammations (Freund's complete adjuvant, FCA), cytolysis (saponins andpluronic polymers) and pyrogenicity, arthritis and anterior uveitis (LPSand MDP). Although FCA is an excellent adjuvant and widely used inresearch, it is not licensed for use in human or veterinary vaccinesbecause of its toxicity.

Desirable characteristics of ideal adjuvants include:

(1) lack of toxicity;

(2) ability to stimulate a long-lasting immune response;

(3) simplicity of manufacture and stability in long-term storage;

(4) ability to elicit both CMI and HIR to antigens administered byvarious routes, if required;

(5) synergy with other adjuvants;

(6) capability of selectively interacting with populations of antigenpresenting cells (APC);

(7) ability to specifically elicit appropriate T_(H)1 or T_(H)2cell-specific immune responses; and

(8) ability to selectively increase appropriate antibody isotype levels(for example, IgA) against antigens.

U.S. Pat. No. 4,855,283 granted to Lockhoff et al on Aug. 8, 1989, whichis incorporated herein by reference thereto, teaches glycolipidanalogues including N-glycosylamides, N-glycosylureas andN-glycosylcarbamates, each of which is substituted in the sugar residueby an amino acid, as immuno-modulators or adjuvants. Thus, Lockhoff etal. 1991 (ref. 24) reported that N-glycolipid analogs displayingstructural similarities to the naturally-occurring glycolipids, such asglycophospholipids and glycoglycerolipids, are capable of elicitingstrong immune responses in both herpes simplex virus vaccine andpseudorabies virus vaccine. Some glycolipids have been synthesized fromlong chain-alkylamines and fatty acids that are linked directly with thesugars through the anomeric carbon atom, to mimic the functions of thenaturally occurring lipid residues.

U.S. Pat. No. 4,258,029 granted to Moloney, assigned to the assigneehereof and incorporated herein by reference thereto, teaches thatoctadecyl tyrosine hydrochloride (OTH) functions as an adjuvant whencomplexed with tetanus toxoid and formalin inactivated type I, II andIII poliomyelitis virus vaccine. Also, Nixon-George et al. 1990, (ref.25) reported that octadecyl esters of aromatic amino acids complexedwith a recombinant hepatitis B surface antigen, enhanced the host immuneresponses against hepatitis B virus.

2. Immunoassays

The transferrin receptor proteins, analogs and/or fragments thereof ofthe present invention are useful as immunogens, as antigens inimmunoassays including enzyme-linked immunosorbent assays (ELISA), RIAsand other non-enzyme linked antibody binding assays or procedures knownin the art for the detection of anti-Moraxella, transferrin receptorprotein antibodies. In ELISA assays, the transferrin receptor protein,analogs and/or fragments corresponding to portions of TfR protein, areimmobilized onto a selected surface, for example, a surface capable ofbinding proteins or peptides such as the wells of a polystyrenemicrotiter plate. After washing to remove incompletely adsorbedtransferrin receptor, analogs and/or fragments, a non-specific proteinsuch as a solution of bovine serum albumin (BSA) or casein that is knownto be antigenically neutral with regard to the test sample may be boundto the selected surface. This allows for blocking of nonspecificadsorption sites on the immobilizing surface and thus reduces thebackground caused by non-specific bindings of antisera onto the surface.

The immobilizing surface is then contacted with a sample, such asclinical or biological materials, to be tested in a manner conducive toimmune complex (antigen/antibody) formation. This procedure may includediluting the sample with diluents, such as BSA, bovine gamma globulin(BGG) and/or phosphate buffered saline (PBS)/Tween. The sample is thenallowed to incubate for from about 2 to 4 hours, at temperatures such asof the order of about 25° to 37° C. Following incubation, thesample-contacted surface is washed to remove non-immunocomplexedmaterial. The washing procedure may include washing with a solution suchas PBS/Tween or a borate buffer.

Following formation of specific immunocomplexes between the test sampleand the bound transferrin receptor protein, analogs and/or fragments andsubsequent washing, the occurrence, and even amount, of immunocomplexformation may be determined by subjecting the immunocomplex to a secondantibody having specificity for the first antibody. If the test sampleis of human origin, the second antibody is an antibody havingspecificity for human immunoglobulins and in general IgG. To providedetecting means, the second antibody may have an associated activitysuch as an enzymatic activity that will generate, for example, a colordevelopment upon incubating with an appropriate chromogenic substrate.Quantification may then achieved by measuring the degree of colorgeneration using, for example, a spectrophotometer.

3. Use of Sequences as Hybridization Probes

The nucleotide sequences of the present invention, comprising thesequence of the transferrin receptor gene, now allow for theidentification and cloning of the transferrin receptor genes from anyspecies of Moraxella.

The nucleotide sequences comprising the sequence of the transferrinreceptor genes of the present invention are useful for their ability toselectively form duplex molecules with complementary stretches of otherTfR genes. Depending on the application, a variety of hybridizationconditions may be employed to achieve varying degrees of selectivity ofthe probe toward the other TfR genes. For a high degree of selectivity,relatively stringent conditions are used to form the duplexes, such aslow salt and/or high temperature conditions, such as provided by 0.02 Mto 0.15 M NaCl at temperatures of between about 50° C. to 70° C. Forsome applications, less stringent hybridization conditions are requiredsuch as 0.15 M to 0.9 M salt, at temperatures ranging from between about20° C. to 55° C. Hybridization conditions can also be rendered morestringent by the addition of increasing amounts of formamide, todestabilize the hybrid duplex. Thus, particular hybridization conditionscan be readily manipulated, and will generally be a method of choicedepending on the desired results. In general, convenient hybridizationtemperatures in the presence of 50% formamide are: 42° C. for a probewhich is 95 to 100% homologous to the target fragment, 37° C. for 90 to95% homology and 32° C. for 85 to 90% homology.

In a clinical diagnostic embodiment, the nucleic acid sequences of theTfR genes of the present invention may be used in combination with anappropriate means, such as a label, for determining hybridization. Awide variety of appropriate indicator means are known in the art,including radioactive, enzymatic or other ligands, such as avidin/biotinand digoxigenin-labelling, which are capable of providing a detectablesignal. In some diagnostic embodiments, an enzyme tag such as urease,alkaline phosphatase or peroxidase, instead of a radioactive tag may beused. In the case of enzyme tags, colorimetric indicator substrates areknown which can be employed to provide a means visible to the human eyeor spectrophotometrically, to identify specific hybridization withsamples containing TfR gene sequences.

The nucleic acid sequences of TfR genes of the present invention areuseful as hybridization probes in solution hybridizations and inembodiments employing solid-phase procedures. In embodiments involvingsolid-phase procedures, the test DNA (or RNA) from samples, such asclinical samples, including exudates, body fluids (e.g., serum, amnioticfluid, middle ear effusion, sputum, bronchoalveolar lavage fluid) oreven tissues, is adsorbed or otherwise affixed to a selected matrix orsurface. The fixed, single-stranded nucleic acid is then subjected tospecific hybridization with selected probes comprising the nucleic acidsequences of the TfR genes or fragments thereof of the present inventionunder desired conditions. The selected conditions will depend on theparticular circumstances based on the particular criteria requireddepending on, for example, the G+C contents, type of target nucleicacid, source of nucleic acid, size of hybridization probe etc. Followingwashing of the hybridization surface so as to remove non-specificallybound probe molecules, specific hybridization is detected, or evenquantified, by means of the label. It is preferred to select nucleicacid sequence portions which are conserved among species of Moraxella.The selected probe may be at least 18 bp and may be in the range ofabout 30 to 90 bp.

4. Expression of the Transferrin Receptor Genes

Plasmid vectors containing replicon and control sequences which arederived from species compatible with the host cell may be used for theexpression of the transferrin receptor genes in expression systems. Thevector ordinarily carries a replication site, as well as markingsequences which are capable of providing phenotypic selection intransformed cells. For example, E. coli may be transformed using pBR322which contains genes for ampicillin and tetracycline resistance and thusprovides easy means for identifying transformed cells. The pBR322plasmid, or other microbial plasmid or phage, must also contain, or bemodified to contain, promoters which can be used by the host cell forexpression of its own proteins.

In addition, phage vectors containing replicon and control sequencesthat are compatible with the host can be used as a transforming vectorin connection with these hosts. For example, the phage in lambda GEM™-11may be utilized in making recombinant phage vectors which can be used totransform host cells, such as E. coli LE392.

Promoters commonly used in recombinant DNA construction include theβ-lactamase (penicillinase) and lactose promoter systems and othermicrobial promoters, such as the T7 promoter system as described in U.S.Pat. No. 4,952,496. Details concerning the nucleotide sequences ofpromoters are known, enabling a skilled worker to ligate themfunctionally with genes. The particular promoter used will generally bea matter of choice depending upon the desired results. Hosts that areappropriate for expression of the transferrin receptor genes, fragments,analogs or variants thereof, may include E. coli, Bacillus species,Haemophilus, fungi, yeast, Moraxella, Bordetella, or the baculovirusexpression system may be used.

In accordance with this invention, it is preferred to make thetransferrin receptor protein, fragment or analog thereof, by recombinantmethods, particularly since the naturally occurring TfR protein aspurified from a culture of a species of Moraxella may include traceamounts of toxic materials or other contaminants. This problem can beavoided by using recombinantly produced TfR protein in heterologoussystems which can be isolated from the host in a manner to minimizecontaminants in the purified material. Particularly desirable hosts forexpression in this regard include Gram positive bacteria which do nothave LPS and are, therefore, endotoxin free. Such hosts include speciesof Bacillus and may be particularly useful for the production ofnon-pyrogenic transferrin receptor, fragments or analogs thereof.Furthermore, recombinant methods of production permit the manufacture ofTbp1 or Tbp2 or respective analogs or fragments thereof, separate fromone another which is distinct from the normal combined proteins presentin Moraxella.

Sequence Alignment and Analysis

Sequence alignments were performed using the ALIGN (Trademark) orGENALIGN (Trademark) computer programs (Inteligenetics Suite 5.4, OxfordMolecular). ALIGN® uses the Needleman-Wunsch algorithm (ref. 32) and itslater modifications to locate regions of similarity between twosequences using the default parameters of the program. Finding regionsof maximum similarity between two sequences can be solved in a rigorousmanner using the iterative matrix calculation of the Needleman andWunsch 1997 algorithm. The analysis is restricted to regions with nointernal deletions or insertions, joined by a minimum number ofloop-outs or deletions. Sellers (ref. 33) developed a true metricmeasure of the “distance” between sequences and Waterman (ref. 34)extended this algorithm to include insertions and deletions of arbitrarylength. Smith (ref. 35) improved the early algorithms to find thesubsequences of maximum similarity. The algorithm has been used toanalyze sequences as long as 5000 bases by dividing these sequences intosegments of 200 to 400 bases, and then reassembling them into a finalbest match. This method of dividing the sequence and then reassemblingit has proven quite robust. The algorithm permits the size of thesegment to be specified which the program searches for similarities. Theprogram then assembles the segments after checking overlaps of adjacentsubsequences. The weighting of deletions and the relative size ofoverlaps may be controlled. The program displays the results to show thedifferences in closely related sequences.

GENALIGN® is a multiple alignment program. Up to 99 sequences using theMartinez/Regions (ref. 36) or Needleman-Wunsch (ref. 32) method may beanalyzed for alignment. GENALIGN places the sequences in an order thatputs the most closely aligned sequence pairs adjacent to each other. Aconsensus sequence is displayed under the multiple sequence alignments.The sequences used in developing the consensus sequence file for use inother programs. GENEALIGN allows the parameters of the search to bechanged so that alternate alignments of the sequences can be formed.

Biological Deposits

Certain vectors that contain at least a portion coding for a transferrinreceptor protein from strains of Moraxella catarrhalis strain 4223 andQ8 and a strain of M. catarrhalis RH408 that are described and referredto herein have been deposited with the American Type Culture Collection(ATCC) located at 12301 Parklawn Drive, Rockville, Md., USA, pursuant tothe Budapest Treaty and prior to the filing of this application. Samplesof the deposited vectors and bacterial strain will become available tothe public and the restrictions imposed on access to the deposits willbe removed upon grant of a patent based upon this United States patentapplication. In addition, the deposit will be replaced if viable samplescannot be dispensed by the Depository. The invention described andclaimed herein is not to be limited in scope by the biological materialsdeposited, since the deposited embodiment is intended only as anillustration of the invention. Any equivalent or similar vectors orstrains that encode similar or equivalent antigens as described in thisapplication are within the scope of the invention.

Deposit Summary ATCC DEPOSIT DESIGNATION DATE DEPOSITED Phage LEM3-2497,381 December 4, 1995 Phage SLRD-A 97,380 December 4, 1995 PlasmidpLEM29 97,461 March 8, 1996 Plasmid pSLRD35A 97,833 January 13, 1997Plasmid pLEM37 97,834 January 13, 1997 Strain RH408 55,637 December 9,1994

EXAMPLES

The above disclosure generally describes the present invention. A morecomplete understanding can be obtained by reference to the followingspecific Examples. These Examples are described solely for purposes ofillustration and are not intended to limit the scope of the invention.Changes in form and substitution of equivalents are contemplated ascircumstances may suggest or render expedient. Although specific termshave been employed herein, such terms are intended in a descriptivesense and not for purposes of limitations.

Methods of molecular genetics, protein biochemistry and immunology usedbut not explicitly described in this disclosure and these Examples areamply reported in the scientific literature and are well within theability of those skilled in the art.

Example 1

This Example illustrates the preparation and immunization of guinea pigswith Tbp1 and Tbp2 proteins from M. catarrhalis.

Tbp1 and Tbp2 proteins were obtained as follows:

Iron-starved crude total membrane preparations were diluted to 4 mgprotein/ml in 50 mM Tris.HCl-1M NaCl, pH 8, in a total volume of 384 ml.Membranes were solubilized by the addition of 8 ml each of 0.5M EDTA and30% sarkosyl and samples were incubated for 2 hours at room temperature,with gentle agitation. Solubilized membranes were centrifuged at 10K rpmfor 20 min. 15 ml of apo-hTf-Sepharose 4B were added to the supernatant,and incubated for 2 hours at room temperature, with gentle shaking. Themixture was added into a column. The column was washed with 50 ml of 50mM Tris.HCl-1 M NaCl-250 mM guanidine hydrochloride, to removecontaminating proteins. Tbp2 was eluted from the column by the additionof 100 ml of 1.5M guanidine hydrochloride. Tbp1 was eluted by theaddition of 100 ml of 3M guanidine hydrochloride. The first 20 mlfractions were dialyzed against 3 changes of 50 mM Tris.HCl, pH 8.0.Samples were stored at −20° C., or dialyzed against ammonium bicarbonateand lyophilized.

Guinea pigs (Charles River) were immunized intramuscularly on day +1with a 10 μg dose of Tbp1 or Tbp2 emulsified in complete Freund'sadjuvant. Animals were boosted on days +14 and +29 with the same dose ofprotein emulsified in incomplete Freund's adjuvant. Blood samples weretaken on day +42, and sera were used for analysis of bactericidalantibody activity. In addition, all antisera were assessed by immunoblotanalysis for reactivity with M. catarrhalis 4223 proteins.

The bactericidal antibody activity of guinea pig anti-M. catarrhalis4223 Tbp1 or Tbp2 antisera was determined as follows. A non-clumping M.catarrhalis strain RH408, derived from isolate 4223, was inoculated into20 ml of BHI broth, and grown for 18 hr at 37° C., shaking at 170 rpm.One ml of this culture was used to inoculate 20 ml of BHI supplementedwith 25 mM ethylenediamine-di-hydroxyphenylacetic acid (EDDA; Sigma).The culture was grown to an OD₅₇₈ of 0.5. The cells were diluted1:200,000 in 140 mM NaCl, 93 mM NaHCO₃, 2 mM Na barbiturate, 4 mMbarbituric acid, 0.5 mM MgCl₂.6H₂O, 0.4 mM CaCl₂.2H₂O, pH 7.6 (Veronalbuffer), containing 0.1% bovine serum albumin (VBS) and placed on ice.Guinea pig anti-M. catarrhalis 4223 Tbp1 or Tpb2 antisera, along withprebleed control antisera, were heated to 56° C. for 30 min. toinactivate endogenous complement. Serial twofold dilutions of eachantisera in VBS were added to the wells of a 96-well Nunclon microtitreplate (Nunc, Roskilde, Denmark). Dilutions started at 1:8, and wereprepared to a final volume of 25 μL in each well. 25 μL of dilutedbacterial cells were added to each of the wells. A guinea pig complement(Biowhittaker, Walkersville, Md.) was diluted 1:10 in VBS, and 25 μLportions were added to each well. The plates were incubated at 37° C.for 60 min, gently shaking at 70 rpm on a rotary platform. 50 μL of eachreaction mixture were plated onto Mueller Hinton (Becton-Dickinson,Cockeysville, Md.) agar plates. The plates were incubated at 37° C. for72 hr and the number of colonies per plate were counted. Bactericidaltitres were assessed as the reciprocal of the highest dilution ofantiserum capable of killing greater than 50% of bacteria compared withcontrols containing pre-immune sera. Results shown in Table 1 belowillustrate the ability of the anti-Tbp1 and anti-Tbp2 guinea pigantisera to lyze M. catarrhalis.

Example 2

This Example illustrates the preparation of chromosomal DNA from M.catarrhalis strains 4223 and Q8.

M. catarrhalis isolate 4223 was inoculated into 100 ml of BHI broth, andincubated for 18 hr at 37° C. with shaking. The cells were harvested bycentrifugation at 10,000×g for 20 min. The pellet was used forextraction of M. catarrhalis 4223 chromosomal DNA.

The cell pellet was resuspended in 20 ml of 10 mM Tris-HCl (pH 7.5)-1.0mM EDTA (TE). Pronase and SDS were added to final concentrations of 500μg/ml and 1.0%, respectively, and the suspension was incubated at 37° C.for 2 hr. After several sequential extractions with phenol,phenol:chloroform (1:1), and chloroform:isoamyl alcohol (24:1), theaqueous extract was dialysed, at 4° C., against 1.0 M NaCl for 4 hr, andagainst TE (pH 7.5) for a further 48 hr with three buffer changes. Twovolumes of ethanol were added to the dialysate, and the DNA was spooledonto a glass rod. The DNA was allowed to air-dry, and was dissolved in3.0 ml of water. Concentration was estimated, by UV spectrophotometry,to be about 290 μg/ml.

M. catarrhalis strain Q8 was grown in BHI broth as described inExample 1. Cells were pelleted from 50 ml of culture by centrifugationat 5000 rpm for 20 minutes, at 4° C. The cell pellet was resuspended in10 ml of TE (10 mM Tris-HCl, 1 mM EDTA, pH 7.5) and proteinase K and SDSwere added to final concentrations of 500 μg/ml and 1%, respectively.The sample was incubated at 37° C. for 4 hours until a clear lysate wasobtained. The lysate was extracted twice with Tris-saturatedphenol/chloroform (1:1), and twice with chloroform. The final aqueousphase was dialysed for 24 hours against 2×1000 ml of 1 M NaCl at 4° C.,changing the buffer once, and for 24 hours against 2×1000 ml of TE at 4°C., changing the buffer once. The final dialysate was precipitated withtwo volume of 100% ethanol. The DNA was spooled, dried and resuspendedin 5 to 10 ml of TE buffer.

Example 3

This Example illustrates the construction of M. catarrhalis chromosomallibraries in EMBL3.

A series of Sau3A restriction digests of chromosomal DNA, in finalvolumes of 10 μL each, were carried out in order to optimize theconditions necessary to generate maximal amounts of restrictionfragments within a 15 to 23 kb size range. Using the optimized digestionconditions, a large-scale digestion was set up in a 100 μL volume,containing the following: 50 μL of chromosomal DNA (290 μg/ml), 33 μLwater, 10 μL 10× Sau3A buffer (New England Biolabs), 1.0 μL BSA (10mg/ml, New England Biolabs), and 6.3 μL Sau3A (0.04 U/μL). Following a15 min. incubation at 37° C., the digestion was terminated by theaddition of 10 μL of 100 mM Tris-HCl (pH 8.0)-10 mM EDTA-0.1%bromophenol blue-50% glycerol (loading buffer). Digested DNA waselectrophoresed through a 0.5% agarose gel in 40 mM Tris acetate-2 mMNa₂EDTA.2H₂O (pH8.5) (TAE buffer) at 50 V for 6 hr. The regioncontaining restriction fragments within a 15 to 23 kb molecular sizerange was excised from the gel, and placed into dialysis tubingcontaining 3.0 ml of TAE buffer. DNA was electroeluted from the gelfragment by applying a field strength of 1.0 V/cm for 18 hr.Electroeluted DNA was extracted once each with phenol andphenol:chloroform (1:1), and precipitated with ethanol. The dried DNAwas dissolved in 5.0 μL water.

Size-fractionated chromosomal DNA was ligated with BamHI-digested EMBL3arms (Promega), using T4 DNA ligase in a final volume of 9 μL. Theentire ligation mixture was packaged into lambda phage using acommercial packaging kit (Amersham), following manufacturer'sinstructions.

The packaged DNA library was amplified on solid media. 0.1 ml aliquotsof Escherichia coli strain NM539 in 10 mM MgSO₄ (OD₂₆₀=0.5) wereincubated at 37° C. for 15 min. with 15 to 25 μL of the packaged DNAlibrary. Samples were mixed with 3 ml of 0.6% agarose containing 1.0%BBL trypticase peptone-0.5% NaCl (BBL top agarose), and mixtures wereplated onto 1.5% agar plates containing 1.0% BBL trypticase peptone-0.5%NaCl, and incubated at 37° C. for 18 hr. 3 ml quantities of 50 mMTris-HCl (pH 7.5)-8 mM magnesium sulfate heptahydrate-100 mM NaCl-0.01%(w/v) gelatin (SM buffer) were added to each plate, and plates were leftat 4° C. for 7 hr. SM buffer containing phage was collected from theplates, pooled together, and stored in a screwcap tube at 4° C., withchloroform.

Chromosomal DNA from M. catarrhalis strain Q8 was digested with Sau3A I(0.1 unit/30 μg DNA) at 37° C. for 30 minutes and size-fractionated on a0.6% low melting point agarose gel. DNA fragments of 15-23 kb wereexcised and the DNA was electroeluted for 25 minutes in dialysis tubingcontaining TAE (40 mM Tris acetate pH 8.5, 2 mM EDTA) at 150 V. The DNAwas extracted once with phenol/chloroform (1:1), precipitated, andresuspended in water. The DNA was ligated overnight with EMBL3 BamH Iarms (Promega) and the ligation mixture was packaged using the Lambda invitro packaging kit (Stratagene) and plated onto E. coli LE392 cells.The library was titrated and stored at 4° C. in the presence of 0.3%chloroform.

Example 4

This Example illustrates screening of the M. catarrhalis libraries.

Ten μL aliquots of phage stock from the EMBL3/4223 sample prepared inExample 3 above were combined each with 100 μL of E. coli strain LE392in 10 mM MgSO4 (OD₂₆₀=0.5) (plating cells), and incubated at 37° C. for15 min. The samples were mixed with 3 ml each of BBL top agarose, andthe mixtures were poured onto 1.5% agarose plates containing 1% bactotryptone-0.5% bacto yeast extract-0.05% NaCl (LB agarose; Difco) andsupplemented with 200 μM EDDA. The plates were incubated at 37° C. for18 hr. Plaques were lifted onto nitrocellulose filters (AmershamHybond-C Extra) using a standard protocol, and the filters were immersedinto 5% bovine serum albumin (BSA; Boehringer) in 20 mM Tris-HCl (pH7.5)-150 mM NaCl (TBS) for 30 min at room temperature, or 4° C.overnight. Filters were incubated for at least 1 hr at room temperature,or 18 hr at 4° C., in TBS containing a 1/1000 dilution of guinea piganti-M. catarrhalis 4223 Tbp1 antiserum. Following four sequential 10min. washes in TBS with 0.05% Tween 20 (TBS-Tween), filters wereincubated for 30 min. at room temperature in TBS-Tween containing a1/4000 dilution of recombinant Protein G labelled with horseradishperoxidase (rprotein G-HRP; Zymed). Filters were washed as above, andsubmerged into CN/DAB substrate solution (Pierce). Color development wasarrested by immersing the filters into water. Positive plaques werecored from the plates, and each placed into 0.5 ml of SM buffercontaining a few drops of chloroform. The screening procedure wasrepeated two more times, until 100% of the lifted plaques were positiveusing the guinea pig anti-M. catarrhalis 4223 Tbp1 antiserum.

The EMBL3/Q8 library was plated onto LE392 cells on YT plates using 0.7%top agar in YT as overlay. Plaques were lifted onto nitrocellulosefilters and the filters were probed with oligonucleotide probes labelledwith ³²Pα-dCTP (Random Primed DNA labeling kit, Boehringer Mannheim).The pre-hybridization was performed in sodium chloride/sodium citrate(SSC) buffer (ref. 27) at 37° C. for 1 hour and the hybridization wasperformed at 42° C. overnight. The probes were based upon an internalsequence of 4223 tbpA:

IRDLTRYDPG (Seq ID No. 31)

4236-RD 5′ATTCGAGACTTAACACGCTATGACCCTGGC 3′ (Seq ID No 32)

4237-RD 5′ATTCGTGATTTAACTCGCTATGACCCTGGT 3′ (Seq ID No 33).

Putative plaques were re-plated and submitted to second and third roundsof screening using the same procedures. Phage clone SLRD-A was used tosubclone the tfr genes for sequence analysis.

Example 5

This Example illustrates immunoblot analysis of the phage lysates usinganti-M. catarrhalis 4223 Tbp1 and Tbp2 antisera.

Proteins expressed by the phage eluants selected in Example 4 above wereprecipitated as follows. 60 μL of each phage eluant were combined with200 μL E. coli LE392 plating cells, and incubated at 37° C. for 15 min.The mixture was inoculated into 10 ml of 1.0% NZamine A-0.5% NaCl-0.1%casamino acids-0.5% yeast extract-0.2% magnesium sulfate heptahydrate(NZCYM broth), supplemented with 200 mM EDDA, and grown at 37° C. for 18hr, with shaking. DNAse was added to 1.0 ml of the culture, to a finalconcentration of 50 μg/ml, and the sample was incubated at 37° C. for 30min. Trichloroacetic acid was added to a final concentration of 12.5%,and the mixture was left on ice for 15 min. Proteins were pelleted bycentrifugation at 13,000×g for 10 min, and the pellet was washed with1.0 ml of acetone. The pellet was air-dried and resuspended in 50 μL 4%SDS-20 mM Tris-HCl (pH 8.0)-0.2 mM EDTA (lysis buffer).

Following SDS-PAGE electrophoresis through an 11.5% gel, the proteinswere transferred to Immobilon-P filters (Millipore) at a constantvoltage of 20 V for 18 hr, in 25 mM Tris-HCl,220 mM glycine-20% methanol(transfer buffer). Membranes were blocked in 5% BSA in TBS for 30 min.at room temperature. Blots were exposed either to guinea pig anti-M.catarrhalis 4223 Tbp1, or to guinea pig anti-M. catarrhalis 4223 Tbp2antiserum, diluted 1/500 in TBS-Tween, for 2 hr at room temperature.Following three sequential 10 min. washes in TBS-Tween, membranes wereincubated in TBS-Tween containing a 1/4000 dilution of rProtein G-HRPfor 30 min. at room temperature. Membranes were washed as describedabove, and immersed into CN/DAB substrate solution. Color developmentwas arrested by immersing blots into water.

Three EMBL3 phage clones expressed both a 115 kDa protein which reactedwith anti-Tbp1 antiserum, and an 80 kDa protein, which reacted withanti-Tbp2 antiserum on Western blots and were thus concluded to containgenes encoding the transferrin receptor proteins of Moraxellacatarrhalis.

Example 6

This Example illustrates the subcloning of the M. catarrhalis 4223 Tbp1protein gene, tbpA.

Plate lysate cultures of the recombinant phage described in Example 5were prepared by combining phage eluant and E. coli LE392 plating cells,to produce confluent lysis on LB agar plates. Phage DNA was extractedfrom the plate lysates using a Wizard Lambda Preps DNA PurificationSystem (Promega), according to manufacturer's instructions.

The EMBL3 clone LM3-24 was found to contain a 13.2 kb insert, flanked bytwo SalI sites. A probe to a tbpA gene was prepared and consisted of a300 base pair amplified product generated by PCR using two degenerateoligonucleotide primers corresponding to an amino acid sequence of partof the Tbp1 protein (FIG. 1). The primer sequences were based upon theamino acid sequences NEVTGLG (SEQ ID NO: 17) and GAINEIE (SEQ ID NO:18), which had been found to be conserved among the deduced amino acidsequences from several different N. meningitidis and Haemophilusinfluenzae tbpA genes. The amplified product was cloned into pCRII(Invitrogen, San Diego, Calif.) and sequenced. The deduced amino acidsequence shared homology with other putative amino acid sequencesderived from N. meningitidis and H. influenzae tbpA genes (FIGS. 12A to12G). The subclone was linearized with NotI (New England Biolabs), andlabelled using a digoxigenin random-labelling kit (Boehringer Mannheim),according to manufacturer's instructions. The concentration of the probewas estimated to be 2 ng/μL.

DNA from the phage clone was digested with HindIII, AvrII, SalI/SphI, orSalI/AvrII, and electrophoresed through a 0.8% agarose gel. DNA wastransferred to a nylon membrane (Genescreen Plus, Dupont) using an LKBVacuGene XL vacuum transfer apparatus (Pharmacia). Following transfer,the blot was air-dried, and pre-hybridized in 5× SSC-0.1%N-lauroylsarcosine-0.02% sodium dodecyl sulfate-1.0% blocking reagent(Boehringer Mannheim) in 10 mM maleic acid-15 mM NaCl (pH 7.5)(pre-hybridization solution). Labelled probe was added to thepre-hybridization solution to a final concentration of 6 ng/ml, and theblot was incubated in the probe solution at 42° C. for 18 hr. The blotwas washed twice in 2× SSC-0.1% SDS, for 5 min. each at roomtemperature, then twice in 0.1× SSC-0.1% SDS for 15 min. each at 60° C.Following the washes, the membrane was equilibrated in 100 mM maleicacid-150 mM NaCl (pH 7.5) (buffer 1) for 1 min, then left in 1.0%blocking reagent (Boehringer Mannheim) in buffer 1 (buffer 2) for 60min, at room temperature. The blot was exposed to anti-DIG-alkalinephosphatase (Boehringer Mannheim) diluted 1/5000 in buffer 2, for 30min. at room temperature. Following two 15 min. washes in buffer 1, theblot was equilibrated in 100 mM Tris-HCl (pH 9.5), 100 mM NaCl, 50 mMMgCl₂ (buffer 3) for 2 min. The blot was wetted with Lumigen PPDsubstrate (Boehringer-Mannheim), diluted 1/100 in buffer 3, then wrappedin Saran wrap, and exposed to X-ray film for 30 min. The probehybridized to a 3.8 kb HindIII-HindIII, a 2.0 kb AvrII-AvrII, and a 4.2kb SalI-SphI fragment.

In order to subclone the 3.8 kb HindIII-HindIII fragment into pACYC177,phage DNA from the EMBL3 clone, and plasmid DNA from the vector pACYC177(New England Biolabs), were digested with HindIII, and fractionated byelectrophoresis on a 0.8% agarose gel. The 3.8 kb HindIII-HindIII phageDNA fragment, and the 3.9 kb HindIII-HindIII pACYC177 fragment, wereexcised from the gel and purified using a Geneclean kit (Bio 101 Inc.,LaJolla, Calif.), according to manufacturer's directions. Purifiedinsert and vector were ligated together using T4 DNA ligase (New EnglandBiolabs), and transformed into E. coli HB101 (Gibco BRL). A QiagenPlasmid Midi-Kit (Qiagen) was used to extract and purifysequencing-quality DNA from one of theampicillin-resistant/kanamycin-sensitive transformants, which was foundto carry a 3.8 kb HindIII-HindIII insert. The subclone was named pLEM3.As described in Example 7, below, subsequent sequencing revealed thatpLEM3. contained the first about 2.0 kb of tbpA sequence (FIGS. 2 and 5Ato 5J).

In order to subclone the remaining 1 kb of the tbpA gene, a 1.6 kbHindIII-HindIII fragment was subcloned into pACYC177 as described above,and transformed by electroporation into E. coli HB101 (Gibco BRL). AMidi-Plasmid DNA kit (Qiagen) was used to extract plasmid DNA from aputative kanamycin-sensitive transformant carrying a plasmid with a 1.6kb HindIII-HindIII insert. The subclone was termed pLEM25. As describedin Example 7 below, sequencing revealed that pLEM25 contained theremaining 1 kb of the tbpA gene (FIGS. 2 and 5A to 5J).

Example 7

This Example illustrates the subcloning of the M. catarrhalis 4223 tbpBgene.

As described above, in all Neisseriae and Haemophilus species examinedprior to the present invention, tbpB genes have been found immediatelyupstream of the tbpA genes which share homology with the tbpA gene of M.catarrhalis 4223. However, the sequence upstream of M. catarrhalis 4223did not correspond with other sequences encoding tbpB.

In order to localize the tbpB gene within the EMBL3 phage clone, aSouthern blot was carried out using a degenerate probe from a highlyconserved amino acid region within the Tbp2 protein. A degenerateoligonucleotide probe, was designed corresponding to the sequenceencoding EGGFYGP (SEQ ID NO: 30), which is conserved within the Tbp2protein in a variety of Neisseriae and Haemophilus species. The probewas labelled with digoxigenin using an oligonucleotide tailing kit(Boehringer Mannheim), following the manufacturer's instructions.HindIII-digested EMBL3 clone DNA was fractionated through a 0.8% agarosegel, and transferred to a Geneclean Plus nylon membrane as described inExample 6. Following hybridization as described above, the membrane waswashed twice in 2× SSC-0.1% SDS, for 5 min. each at room temperature,then twice in 0.1× SSC-0.1% SDS for 15 min. each, at 50° C. Detection ofthe labelled probe was carried out as described above. The probehybridized to a 5.5 kb NheI-SalI fragment.

The 5.5 kb NheI-SalI fragment was subcloned into pBR328 as follows.LEM3-24 DNA, and pBR328 DNA, were digested with NheI-SalI, andelectrophoresed through 0.8% agarose. The 5.5 kb NheI-SalI fragment, andthe 4.9 kb pBR328 NheI-SalI fragments were excised from the gel, andpurified using a Geneclean kit as described in Example 6. The fragmentswere ligated together using T4 DNA ligase, and transformed into E. coliDH5. A Midi-Plasmid DNA kit (Qiagen) was used to extract DNA from anampicillin resistant/tetracycline sensitive clone containing a 5.5 kbNheI-SalI insert. This subclone was termed pLEM23. Sequencing revealedthat pLEM23 contained 2 kb of the tbpB gene from M. catarrhalis 4223(FIG. 2).

Example 8

This Example illustrates the subcloning of M. catarrhalis Q8 tfr genes.

The M. catarrhalis Q8 tfr genes were subcloned as follows. Phage DNA wasprepared from plates. Briefly, the top agarose layer from threeconfluent plates was scraped into 9 ml of SM buffer (0.1 M NaCl, 0.2%MgSO₄, 50 mM Tris-HCl, pH 7.6, 0.01% gelatin) and 100 μl of chloroformwas added. The mixture was vortexed for 10 sec, then incubated at roomtemperature for 2h. The cell debris was removed by centrifugation at8000 rpm for 15 min at 4° C. in an SS34 rotor (Sorvall model RC5C). Thephage was pelleted by centrifugation at 35,000 rpm in a 70.1 Ti rotor at10° C. for 2h (Beckman model L8-80) and was resuspended in 500 μl of SMbuffer. The sample was incubated at 4° C. overnight, then RNAse andDNAse were added to final concentrations of 40 μg/ml and 10 μg/ml,respectively and the mixture incubated at 37° C. for 1 h. To the mixturewere added 10 μl of 0.5 M EDTA and 5 μl of 10% SDS and the sample wasincubated at 6° C. for 15 min. The mixture was extracted twice withphenol/chloroform (1:1) and twice with chloroform and the DNA wasprecipitated by the addition of 2.5 volumes of absolute ethanol.

A partial restriction map was generated and fragments were subclonedusing the external Sal I sites from EMBL3 and internal AvrII or EcoR Isites as indicated in FIG. 4. In order to facilitate the subcloning,plasmid pSKMA was constructed which introduces a novel multiple cloningsite into pBluescript.SK (Stratagene). Oligonucleotides were used tointroduce restriction sites for Mst II, Sfi I, and Avr II between theSal I and Hind III sites of pBluescript.SK:

Sfi I  Sal I      Cla I    Mst II    Avr II HindIII  ↓          ↓        ↓    ↓    ↓      ↓ 4639-RD 5′ TCGACGGTAT CGATGGCCTTAG GGGC CTAGGA 3′ (SEQ ID NO: 34) 4640-RD 3′     GCCATA GCTACCGG AATCCCCG GATCCTTCGA (SEQ ID NO: 35)

Plasmid pSLRD1 contains a ˜1.5 kb Sal I-Avr II fragment cloned intopSKMA; plasmids pSLRD2 and pSLRD4 contain ˜2 kb and 4 kb AvrII-AvrIIfragments cloned into pSKMA, respectively and contain the complete tbpAgene. Plasmid pSLRD3 contains a ˜2.3 kb AvrII-EcoR I fragment clonedinto pSKMA and plasmid SLRD5 is a 22.7 kb EcoRI-EcoRI fragment clonedinto pSKMA. These two clones contain the complete tbpB gene (FIG. 7).

Example 9

This Example illustrates sequencing of the M. catarrhalis tbp genes.

Both strands of the tbp genes subcloned according to Examples 6 to 8were sequenced using an Applied Biosystems DNA sequencer. The sequencesof the M. catarrhalis 4223 and Q8 tbpA genes are shown in FIGS. 5A to 5Jand 10A to 10Q respectively. A derived amino acid sequence was comparedwith other Tbp1 amino acid sequences, including those of Neisseriaemeningitidis, Neisseriae gonorrhoeae, and Haemophilus influenzae (FIGS.12A to 12G). The sequence of the M. catarrhalis 4223 and Q8 tbpB genesare shown in FIGS. 6A to 6G and 11A to 11O respectively. In order toobtain sequence from the putative beginning of the tbpB gene of M.catarrhalis 4223, sequence data were obtained directly from the cloneLEM3-24 DNA. This sequence was verified by screening clone DS-1754-1.The sequence of the translated tbpB genes from M. catarrhalis 4223 andQ8 shared homology with deduced Tbp2 amino acid sequences of Neisseriameningitidis, Neisseria gonorrhoeae, and Haemophilus influenzae (FIG.13A to 13F).

Example 10

This Example illustrates the generation of an expression vector toproduce recombinant Tbp1 protein. The construction scheme is shown inFIGS. 14A to 14B.

Plasmid DNA from subclone pLEM3, prepared as described in Example 6, wasdigested with HindIII and BglI to generate a 1.84 kb BglI-HindIIIfragment, containing approximately two-thirds of the tbpA gene. BamHIwas added to the digest to eliminate a comigrating 1.89 kb BglI-HindIIIvector fragment. In addition, plasmid DNA from the vector pT7-7 wasdigested with NdeI and HindIII. To create the beginning of the tbpAgene, an oligonucleotide was synthesized based upon the first 61 basesof the tbpA gene to the BglI site; an NdeI site was incorporated intothe 5′ end. Purified insert, vector and oligonucleotide were ligatedtogether using T4 ligase (New England Biolabs), and transformed into E.coli DH5α. DNA was purified from one of the 4.4 kb ampicillin-resistanttransformants containing correct restriction sites (pLEM27).

Purified pLEM27 DNA was digested with HindIII, ligated to the 1.6 kbHindIII-HindIII insert fragment of pLEM25 prepared as described inExample 6, and transformed into E. coli DH5α. DNA was purified from anampicillin-resistant transformant containing the correct restrictionsites (pLEM29), and was transformed by electroporation into BL21 (DE3)(Novagen; Madison, Wis.) to produce E. coli pLEM29B-1.

A single isolated transformed colony was used to inoculate 100 ml of YTbroth containing 100 μg/ml ampicillin, and the culture was grown at 37°C. overnight, shaking at 200 rpm. 200 μl of the overnight culture wereinoculated into 10 ml of YT broth containing 100 μg/ml ampicillin, andthe culture was grown at 37° C. to an OD₅₇₈ of 0.35. The culture wasinduced by the addition of 30 μl of 100 mM IPTG, and the culture wasgrown at 37° C. for an additional 3 hours. One ml of culture was removedat the time of induction (t=0), and at t=1 hr and t=3 hrs. One mlsamples were pelleted by centrifugation, and resuspended in 4% SDS-20 mMTris.Cl, pH 8-200 μM EDTA (lysis buffer). Samples were fractionated onan 11.5% SDS-PAGE gel, and transferred onto Immobilon filters(Amersham). Blots were developed using anti-Tbp1 (M. catarrhalis 4223)antiserum, diluted 1:1000, as the primary antibody, and rproteinGconjugated with horseradish peroxidase (Zymed) as the secondaryantibody. A chemiluminescent substrate (Lumiglo; Kirkegaard and PerryLaboratories, Gaithersburg, Md.) was used for detection. Inducedrecombinant proteins were visible on the Coomassie-stained gels (FIG.15). The anti-Tbp1 (4223) antiserum recognized the recombinant proteinson Western blots.

Example 11

This Example illustrates the extraction and purification of recombinantTbp1 of M. catarrhalis 4223.

Recombinant Tbp1 protein, which is contained in inclusion bodies, waspurified from E. coli cells expressing the tbpA gene (Example 10), by aprocedure as shown in FIG. 16. E. coli cells from a 500 ml culture,prepared as described in Example 10, were resuspended in 50 ml of 50 mMTris-HCl, pH 8.0 containing 0.1 M NaCl and 5 mM AEBSF (proteaseinhibitor), and disrupted by sonication (3×10 min. 70% duty circle). Theextract was centrifuged at 20,000×g for 30 min. and the resultantsupernatant which contained>85% of the soluble proteins from E. coli wasdiscarded.

The remaining pellet (FIG. 16, PPT₁) was further extracted in 50 ml of50 mM Tris, pH 8.0 containing 0.5% Triton X-100 and 10 mM EDTA. Aftercentrifugation at 20,000×g for 30 min., the supernatant containingresidual soluble proteins and the majority of the membrane proteins wasdiscarded.

The remaining pellet (FIG. 16, PPT₂) was further extracted in 50 ml of50 mM Tris, pH 8.0 containing 2M urea and 5 mM dithiothroitol (DTT).After centrifugation at 20,000×g for 30 min., the resultant pellet (FIG.16, PPT₃) obtained after the above extraction contained the purifiedinclusion bodies.

The Tbp1 protein was solubilized from PPT3 in 50 mM Tris, pH 8.0,containing 6 M guanidine hydrochloride and 5 mM DTT. Aftercentrifugation, the resultant supernatant was further purified on aSuperdex 200 gel filtration column equilibrated in 50 mM Tris, pH 8.0,containing 2M guanidine hydrochloride and 5 mM DTT. The fractions wereanalyzed by SDS-PAGE and those containing purified Tbp1 were pooled.Triton X-100 was added to the pooled Tbp1 fraction to a finalconcentration of 0.1%. The fraction was then dialyzed overnight at 4° C.against 50 mM Tris, pH 8.0 and then centrifuged at 20,000×g for 30 min.The protein remained soluble under these conditions-and the purifiedTbp1 was stored at −20° C. The purification procedure shown in FIG. 16produced Tbp1 protein that was at least 70% pure as determined bySDS-PAGE analysis (FIG. 17).

Example 12

This Example illustrates the construction of an expression plasmid forrTbp2 of M. catarrhalis 4223 without a leader sequence.

The construction scheme for the plasmid expressing rTbp2 is shown inFIG. 18A and 18B. Oligonucleotides were used to construct the firstapproximately 58 bp of the M. catarrhalis 4223 tbpB gene encoding themature protein. An NdeI site was incorporated into the 5′ end of theoligonucleotides:

5′TATGTGTGGTGGCAGTGGTGGTTCAAATCCACCTGCTCCTACGCCCATT CCAAATG (SEQ ID NO:36) 3′

3′ACACACCACCGTCACCACCAAGTTTAGGTGGACGAGGATGCGGGTAAGG TTTACGATC (SEQ IDNO: 37) 5′

An NheI-ClaI fragment, containing approximately 1 kb of the tbpB genefrom pLEM23, prepared as described in Example 7, was ligated to theabove oligonucleotides and inserted into pT7-7 cut with NdeI-ClaI,generating pLEM31, which thus contains the 5′-half of tbpB.Oligonucleotides also were used to construct the last approximately 104bp of the tbpB gene, from the AvaII site to the end of the gene. A BamHIsite was incorporated into the 3′ end of the oligonucleotides:

5′GTCCAAATGCAAACGAGATGGGCGGGTCATTTACACACAACGCCGATGACAGCAAAGCCTCTGTGGTCTTTGGCACAAAAAGACAACAAGAAGTTAAGTAGTA G (SEQ ID NO:38) 3′

3′GTTTACGTTTGCTCTACCCGCCCAGTAAATGTGTGTTGCGGCTACTGTCGTTTCGGAGACACCAGAAACCGTGTTTTTCTGTTGTTCTTCAATTCATCATCCTAG (SEQ ID NO: 39)5′

A ClaI-AvaII fragment from pLEM23, containing approximately 0.9 kb ofthe 3′-end of the tbpB gene, was ligated to the AvaII-BamHIoligonucleotides, and inserted into pT7-7 cut with ClaI-BamHI,generating pLEM32. The 1.0 kb NdeI-ClaI insert from pLEM31 and the 1.0kb ClaI-BamHI insert from pLEM32 were then inserted into pT7-7 cut withNdeI-BamHI, generating pLEM33 which has a full-length tbpB gene underthe direction of the T7 promoter.

DNA was purified from pLEM33 and transformed by electroporation intoelectrocompetent BL21(DE3) cells (Novagen; Madison, Wis.), to generatestrain pLEM33B-1. Strain pLEM33B-1 was grown, and induced using IPTG, asdescribed above in Example 10. Expressed proteins were resolved bySDS-PAGE and transferred to membranes suitable for immunoblotting. Blotswere developed using anti-4223 Tbp2 antiserum, diluted 1:4000, as theprimary antibody, and rprotein G conjugated with horseradish peroxidase(Zymed) as the secondary antibody. A chemiluminescent substrate(Lumiglo; Kirkegaard and Perry Laboratories, Gaithersburg, Md.) was usedfor detection. Induced recombinant proteins were visible on theCoomassie blue-stained gels (FIG. 19). The anti-4223 Tbp2 antiserumrecognized the recombinant proteins on Western blots.

Example 13

This Example illustrates the generation of an expression plasmid forrTbp2 of M. catarrhalis 4223 with a leader sequence.

The construction scheme is shown in FIGS. 18A to 18B. Oligonucleotidescontaining the natural leader sequence of the M. catarrhalis 4223 tbpBgene were used to construct the first approximately 115 bp of the tbpBgene to the NheI site. An NdeI site was incorporated into the 5′ end ofthe oligonucleotides:

5′TATGAAACACATTCCTTTAACCACACTGTGTGTGGCAATCTCTGCCGTCTTATTAACCGCTTGTGGTGGCAGTGGTGGTTCAAATCCACCTGCTCCTACGCCCAT TCCAAATG (SEQID NO: 40) 3′

3′ACTTTGTGTAAGGAAATTGGTGTGACACACACCGTTAGAGACGGCAGAATAATTGGCGAACACCACCGTCACCACCAAGTTTAGGTGGACGAGGATGCGGGTAAG GTTTACGATC (SEQID NO: 41) 5′

The NdeI-NheI oligonucleotides were ligated to pLEM33 cut withNdeI-NheI, generating pLEM37, which thus contains a full-length 4223tbpB gene encoding the Tbp2 protein with its leader sequence, driven bythe T7 promoter.

DNA from pLEM37 was purified and transformed by electroporation intoelectrocompetent BL21(DE3) cells (Novagen; Madison, Wis.), to generatestrain pLEM37B-2. pLEM37B-2 was grown, and induced using IPTG, asdescribed above in Example 10. Expressed proteins were resolved bySDS-PAGE and transferred to membranes suitable for immunoblotting. Blotswere developed using anti-4223 Tbp2 antiserum, diluted 1:4000, as theprimary antibody, and rprotein G conjugated with horseradish peroxidase(Zymed) as the secondary antibody. A chemiluminescent substrate(Lumiglo; Kirkegaard and Perry Laboratories, Gaithersburg, Md.) was usedfor detection. Induced recombinant proteins were visible onCoomassie-blue stained gels (FIG. 21). The anti-4223 Tbp2 antiserumrecognized the recombinant proteins on Western blots.

Example 14

This Example illustrates the construction of an expression plasmid forrTbp2 of M. catarrhalis Q8 without a leader sequence.

The construction scheme for rTbp2 is shown in FIGS. 20 and 20B. The5′-end of the tbpB gene of M. catarrhalis Q8 was PCR amplified from theCys¹ codon of the mature protein through the Bsm I restriction site. AnNde I restriction site was introduced at the 5′ end, for later cloninginto pT7-7, and the final PCR fragment was 238 bp in length. The PCRprimers are indicated below:

          NdeI   C   G   G   S   S   G   G   F   N 5′ GAATTCCATATG TGTGGT GGG AGC TCT GGT GGT TTC AAT C (SEQ ID No: 42)      3′   5247.RD

5′CCCATGGCAGGTTCTTGAATGCCTGAAACT3′5236.RD (SEQ ID NO: 43)

The Q8 tbpB gene was subcloned in two fragments contained on plasmidsSLRD3 and SLRD5, prepared as described in Example 8. Plasmid SLRD3-5 wasconstructed to contain the full-length tbpB gene by digesting SLRD5 withEcoR I and Dra I, which releases the 3′-end of tbpB, and inserting this˜619 bp fragment into SLRD3 which had been digested with EcoR I and SmaI. The 1.85 kb Bsm I-BamH I fragment from SLRD 3-5 was ligated with the238 bp PCR fragment and inserted into pT7-7 that had been digested withNde I and BamH I, generating plasmid SLRD35B. This plasmid thus containsthe full-length tbpB gene without its leader sequence, under thedirection of the T7 promoter. DNA from SLRD35B was purified andtransformed by electroporation into electrocompetent BL21(DE3) cells togenerate strain SLRD35BD which was grown and induced using IPTG, asdescribed above in Example 10. Expressed proteins were resolved bySDS-PAGE and the induced Tbp2 protein was clearly visible by Coomassieblue staining (FIG. 19).

Example 15

This Example illustrates the generation of an expression plasmid forrTbp2 of M. catarrhalis Q8 with a leader sequence.

The construction scheme for the rTbp2 is shown in FIGS. 20A and 20B. The5′-end of the Q8 tbpB gene was PCR amplified from the ATG start codon tothe Bsm I restiction site. An Nde I site was engineered at the 5′-end,to facilitate cloning into the pT7-7 expression vector, and the finalPCR fragment was 295 bp. The PCR primers are indicated below:

         Nde I   K   H   I   P   L   T 5′ GAATTCCATATG AAA CAC ATT CCTTTA ACC 3′      5235.RD (SEQ ID NO: 44)

5′CCCATGGCAGGTTCTTGAATGCCTGAAACT3′5236.RD (SEQ ID NO: 43)

SLRD3-5 (Example 14) was digested with Bsm I and BamH I, generating a1.85 kb fragment, which was ligated with the 295 bp PCR fragment andligated into pT7-7 that had been digested with Nde I and BamH I. Theresulting plasmid SLRD35A thus contains the full-length Q8 tbpB genewith its endogenous leader sequence under the control of the T7promoter. DNA from SLRD35A was purified and transformed byelectroporation into electrocompetent BL21(DE3) cells to generate strainSLRD35AD which was grown and induced using IPTG, as described above inExample 10. Expressed proteins were resolved by SDS-PAGE and the inducedTbp2 protein was clearly visible by Coomassie blue staining (FIG. 19).

Example 16

This Example illustrates the extraction and purification of rTbp2 of M.catarrhalis 4223 and Q8 from E. coli.

pLEM37B (4223) and SLRD35AD (Q8) transformants were grown to produceTbp2 in inclusion bodies and then the Tbp2 was purified according to thescheme in FIG. 22. E. coli cells from a 500 mL culture, were resuspendedin 50 mL of 50 mM Tris-HCl, pH 8.0 containing 5 mM AEBSF (proteaseinhibitor), and disrupted by sonication (3×10 min, 70% duty circle). Theextract was centrifuged at 20,000×g for 30 min and the resultantsupernatant which contained>95% of the soluble proteins from E. coli wasdiscarded.

The remaining pellet (PPT₁) was further extracted in 50 mL of 50 mMTris, pH 8.0 containing 0.5% Triton X-100 and 10 mM EDTA. The mixturewas stirred at 4° C. for at least 2 hours and then centrifuged at20,000×g for 30 min and the supernatant containing residual solubleproteins and the majority of the membrane proteins was discarded.

The resultant pellet (PPT₂) obtained after the above extractioncontained the inclusion bodies. The Tbp2 protein was solubilized in 50mM Tris, pH 8.0, containing 6 M guanidine and 5 mM DTT. Aftercentrifugation, the resultant supernatant was further purified on aSuperdex 200 gel filtration column equilibrated in 50 mM Tris, pH 8.0,containing 2 M guanidine and 5 mM DTT. The fractions were analyzed bySDS-PAGE and those containing purified Tbp2 were pooled. Triton X-100was added to the pooled Tbp2 fraction to a final concentration of 0.1%.The fraction was then dialyzed overnight at 4° C. against PBS, and thencentrifuged at 20,000×g for 30 min. The protein remained soluble underthese conditions and the purified Tbp2 was stored at −20° C. FIG. 22shows the SDS PAGE analysis of fractions of the purification process forrTbp2 from strain 4223 (Panel A) and strain Q8 (Panel B). The rTbp2 wasat least 70% pure.

Groups of five BALB/c mice were injected three times subcutaneously(s.c.) on days 1, 29 and 43 with purified rTbp2 (0.3 mg to 10 mg) fromM. catarrhalis strains 4223 and Q8 in the presence or absence of AlPO₄(1.5 mg per dose). Blood samples were taken on days 14, 28, 42 and 56for analysing the anti-rTbp2 antibody titers by EIAs.

Groups of two rabbits and two guinea pigs (Charles River, Quebec) wereimmunized intramuscularly (i.m.) on day 1 with a 5 mg dose of purifiedrTbp2 protein emulsified in complete Freund's adjuvant (CFA). Animalswere boosted on days 14 and 29 with the same dose of protein emulsifiedin incomplete Freund's adjuvant (IFA). Blood samples were taken on day42 for analysing anti-rTbp2 antibody titers and bactericidal activity.Table 2 below shows the bactericidal activity of antibodies raised tothe recombinant tranferrin binding proteins rTbp1 (4223), rTbp2 (4223)and rTbp2 (Q8), prepared as described in these Examples, against M.catarrhalis strains 4223 and Q8.

Example 17

This Example illustrates the binding of Tbp2 to human transferrin invitro.

Transferrin-binding activity of Tbp2 was assessed according to theprocedures of Schryvers and Lee (ref. 28) with modifications. Briefly,purified rTbp2 was subjected to discontinuous electrophoresis through12.5% SDS-PAGE gels. The proteins were electrophoretically transferredto PVDF membrane and incubated with horseradish peroxidase-conjugatedhuman transferrin (HRP-human transferrin, 1:50 dilution) (JacksonImmunoResearch Labs Inc., Mississauga, Ontario) at 4° C. for overnight.LumiGLO substrate (Kirkegaard & Perry Laboratories, Inc., Gaithersburg,Md.) was used for chemiluminescent detection of HRP activity accordingto the manufacturer's instructions. Both 4223 rTbp2 and Q8 rTbp2 bind tohuman transferrin under these conditions, as shown in FIG. 24.

Example 18

This Example illustrates antigenic conservation of Tbp2 amongst M.catarrhalis strains.

Whole cell lysates of M. catarrhalis strains and E. coli strainsexpressing recombinant Tbp2 proteins were separated by SDS-PAGE andelectrophoretically transferred to PVDF membrane. Guinea pig anti-4223rTbp2 or anti-Q8 rTbp2 antisera were used as first antibody and alkalinephosphatase conjugated goat anti-guinea pig antibody was used as secondantibody to detect Tbp2. M. catarrhalis strains 3, 56, 135, 585, 4223,5191, 8185 and ATCC 25240 were tested and all showed specific reactivitywith anti-4223 rTbp2 or anti-Q8 rTbp2 antibody (FIG. 25).

Table 3 illustrates the ability of anti-rTbp2 antibodies from one M.catarrhalis strain to recognize native or recombinant protein from ahomologous or heterologous M. catarrhalis strain.

Example 19

This Example illustrates the cloning of the tbpB gene from an M.catarrhalis strain M35 genomic library.

An EMBL3 phage library was prepared in the same manner as described inExample 3 for strains 4223 and Q8 from chromosomal DNA prepared fromstrain M. catarrhalis in the same manner as described in Example 2 forstrains 4223 and Q8. The M35 phage library was screened with adigoxigenin-labelled (Boehringer Mannheim, Laval, Quebec) 4223 tbpA geneprobe (see Example 4). Phage clone M35-2.3 was found to contain a 13 kbinsert of the M35 tfr genes. The tbpB gene was localized to a 7.5 kbNhel-Sal I fragment by restriction enzyme and Southern blot analyses andwas subcloned into pBR328 for sequence analysis, generating plasmidpLEM40.

A partial restriction map of the M35 tbpB gene is shown in FIG. 26. Thenucleotide and deduced amino acid sequences of the M35 tbpB gene areshown in FIGS. 27A to 27K. The M35 tbpB gene encodes a 706 amino acidprotein of molecular weight 76.5 kDa. When the M35 tbpB sequence wasaligned with the 4223 tbpB protein, it was found to be 86% identical and90% similar.

Example 20

This Example illustrates the PCR amplification of the tbpB genes from M.catarrhalis strains R1, 3 and LES1.

Oligonucleotide primers were based upon the following sequences, whichare found in the intergenic regions surrounding 4223 tbpB:

5′GATGGGATAAGCACGCCCTACTT 3′ (SEQ ID NO: 58) sense primer (4940)

5′CCCATCAGCCAAACAAACATTGTGT 3′ (SEQ ID NO: 59) antisense primer (4967)

PCR amplification was performed in buffer containing 100 mM Tris-HCI (pH8.9), 25 mM KCI, 5 mM (NH₄)₂SO₄ and 2 mM MgSO₄. Each 100 μl reactionmixture contained 10 ng of chromosomal DNA, 1 μg each primer, 2.5 U PwoDNA polymerase (Boehringer Mannheim) and 0.2 mM dNTPs (perkin Elmer,Foster City, Calif.). The cycling conditions were 25 cycles of 95° C.for 30 sec, 45° C. for 1.0 min and 72° C. for 2.0 min, followed by a 10min elongation at 72° C. Specific 2.4 kb fragments were amplified andDNA was purified for direct sequencing by agarose gel extraction, usinga Geneclean kit (Bio 101 Inc., Vista, Calif.). Plasmid DNA forsequencing was prepared using a Qiagen Plasmid Midi kit (Qiagen,Chatsworth, Calif.). DNA samples were sequenced using an ABI model 373ADNA sequencer using dye terminator chemistry. Oligonucleotide primers of17 to 25 bases in length were used to sequence both strands of thegenes.

Partial restriction maps of the M. catarrhalis strain R1, 3 and LES1tbpB genes are shown in FIGS. 28, 29 and 30 respectively. The nucleotideand deduced amino acid sequences of the strain R1, 3 and LES1 tbpB genesare shown in FIGS. 31A to 31G, 32A to 32K and 33A to M′, respectively.The strain 3 tbpB gene encodes a 712 amino acid protein of molecularweight 76.9 kDa which is more closely related to the strain Q8 Tbp2protein than to the 4223 Tbp2 protein. The Q8 and strain 3 Tbp2 proteinsare 71% identical and 79% similar, whereas the 4223 and strain 3 Tbp2proteins are 51% identical and 64% similar. The strain LES1 tbpB geneencodes a 713 amino acid protein of molecular weight 76.8 kDa which is63% identical to both the 4223 and Q8 Tbp2 proteins.

From the sequence analysis, there appear to be at least two genefamilies which can be identified for M. catarrhalis tbpB, one comprisingstrains 4223, R1 and M35 and the other comprising strains Q8 and 3, withstrain LES1 being equally related to both families. This finding issimilar to that of the N. meningitidis tbpB genes which can be dividedinto two sub-groups (ref. 29). There is limited sequence homologybetween the M. catarrhalis Tbp2 proteins and those from other organismssuch as Actinobacillus pleuropneumoniae, H. influenzae, N. gonorrhoeae,N. meningitidis and P. haemolytical (ref. 30). The homology is scatteredin small peptide motifs throughout the sequence and is illustrated byunderlining in FIGS. 34A to 34D. The conserved LEGGFYG (SEQ ID NO: 60)epitope was present, as found in Tbp2 from other M. catarrhalis strainsas well as the H. influenzae and N. meningitidis Tbp2 proteins.

Example 21

This Example illustrates the bactericidal antibody activity of guineapig anti-4223 rTbp2 and anti-Q8 rTbp2 antibodies.

The bactericidal antibody assay was performed as described by ref. 31.Briefly, the M. catarrhalis strains were grown to an OD₅₇₈ of 0.5 in BHImedium containing 25 mM EDDA. The bacteria were diluted so that thepre-bleed control plates contained 100 to 300 cfu. Guinea pig anti-rTbp2antisera and pre-bleed controls, were heated to 56° C. for 30 min toinactivate endogenous complement and were diluted 1:64 with veronalbuffer containing 0.1% BSA (VBS). Guinea pig complement was diluted 1:10in VBS. Twenty-five μl each of diluted antiserum, bacteria andcomplement were added to duplicate wells of a 96 well microtiter plate.The plates were incubated at 37° C. for 60 min, gently shaking at 70 rpmon a rotary platform. Fifty μl of each reaction mixture were plated ontoMueller Hinton agar plates which were incubated at 37° C. for 24 h, thenroom temperature for 24 h, before the bacteria were counted. Antiserawere determined to be bactericidial if ≧50% of bacteria were killedcompared with negative controls. Each assay was repeated at least twicein duplicate and the results are shown in Table 3.

The anti-rTbp2 bactericidal antibody activity corelates with theputative gene families identified by sequencing, as described in Example20. Anti-4223 rTbp2 antibody kills those strains within its own family,i.e. 4223, R1 and M35, while anti-Q8 rTbpB antibody kills those strainswithin its family, i.e. Q8, 3 and LES1. The anti-4223 rTbp2 antibodyalso killed strains VH-9, H-04 and ATCC 25240 indicating that the latterstrains may be part of the 4223 family. Strain H-04 was also killed byanti-Q8 rTbpB antibody.

Example 22

This Example illustrates the sequence analysis of the open reading frame(ORF) within the intergenic. region between M. catarrhalis tbpA andtbpB.

The intergenic region was sequenced for strains 4223 and Q8 and a singleopen reading frame was identified. This orf, identified as orf3, waslocated about 1 kb downstream of tbpA and about 273 bp upstream of tbpBin each genome (FIG. 2—strain 4223; FIG. 7—strain Q8). The nucleotideand deduced amino acid sequences of the entire 4223 tbpA-orf3-tbpB geneloci are shown in FIGS. 35A -35M′. The encoded 4223 and Q8 ORF3 proteinsare 98% identical, 512 amino acid proteins, of molecular weight 58.1 kDaand 57.9 kDa, respectively. The alignment of the ORF3 protein sequencesis shown in FIG. 36.

SUMMARY OF THE DISCLOSURE

In summary of this disclosure, the present invention provides purifiedand isolated DNA molecules containing transferrin receptor genes ofMoraxella catarrhalis, the sequences of these transferrin receptorgenes, and the derived amino acid sequences thereof. The genes and DNAsequences are useful for diagnosis, immunization, and the generation ofdiagnostic and immunological reagents. Immunogenic compositions,including vaccines, based upon expressed recombinant Tbp1 and/or Tbp2,portions thereof, or analogs thereof, can be prepared for prevention ofdiseases caused by Moraxella. Modifications are possible within thescope of this invention.

TABLE I BACTERICIDAL ANTIBODY TITRES FOR M. CATARRHALIS ANTIGENSBACTERICIDAL BACTERICIDAL SOURCE TITRE³ RH408⁴ TITRE Q8⁵ OF Pre- Post-Pre- Post- ANTIGEN¹ ANTISERA² Immune Immune Immune Immune TBP1 GP <3.04.2-6.9 <3.0 4.4-6.2 TBP2 GP <3.0 12.0-13.6 <3.0 <3.0-4.0   ¹antigensisolated from M. catarrhalis 4223 ²GP = guinea pig ³bactericidal titres:expressed in log₂ as the dilution of antiserum capable of killing 50% ofcells ⁴ M. catarrhalis RH408 is a non-clumping derivative of M.catarrhalis 4223 ⁵ M. catarrhalis Q8 is a clinical isolate whichdisplays a non-clumping phenotype

TABLE 2 Bactericidal titre - RH408 Bactericidal titre - Q8 Antigenpre-immune post-immune pre-immune post-immune rTbp1 <3.0 <3.0 <3.0 <3.0(4223) rTbp2 <3.0 10-15 <3.0 <3.0 (4223) rTbp2 (Q8) NT NT <3.0 5.5-7.5Antibody titres are expressed in log₂ as the dilution of antiserumcapable of killing 50% of cells NT = not tested

TABLE 3 ELISA titres for anti-rTbp2 antibodies recognizing native orrTbp2 from strain 4223 or rTbp2 from strain Q8 Anti-rTbp2 (4223)Anti-rTbp2 (Q8) Antibody Titres Antibody Titres Rabbit Guinea pig RabbitGuinea pig Coated antigen antisera antisera antisera antisera NativeTbp2 409,600 1,638,400 25,600 51,200 (4223) 204,800 1,638,400 25,600102,400 rTbp2 (4223) 409,600 1,638,400 102,400 204,800 409,600 1,638,400102,400 204,800 rTbp2 (Q8) 409,600 1,638,400 1,638,400 1,638,400 102,4001,638,400 409,600 1,638,400

TABLE 4 Bactericidal antibody activity of guinea pig anti-rTbpB antiseraBactericidal Antibody Activity* M. catarrhalis strain Anti-A223 rTbp2Anti-Q8 rTbp2 4223 ++ − M35 ++ − R1 ++ − LES1 − + Q8 − ++ 3 − ± VH-9 ++− H-04 ++ ++ ATCC 25240 ** − *killing by antiserum diluted 1:64 comparedto negative controls: − indicates 0 to 25% killing; ± indicates 26 to49%; + indicates 50 to 75%; ++ indicates 76 to 100% killing.

REFERENCES

1. Brorson, J-E., A. Axelsson, and S. E. Holm. 1976. Studies onBranhamella catarrhalis (Neisseria catarrhalis ) with special referenceto maxillary sinusitis. Scan. J. Infect. Dis. 8:151-155.

2. Catlin, B. W., 1990. Branhamella catarrhalis: an organism gainingrespect as a pathogen. Clin. Microbiol. Rev. 3: 293-320.

3. Hager, H., A. Verghese, S. Alvarez, and S. L. Berk. 1987. Branhamellacatarrhalis respiratory infections. Rev. Infect. Dis. 9:1140-1149.

4. McLeod, D. T., F. Ahmad, M. J. Croughan, and M. A. Calder. 1986.Bronchopulmonary infection due to M. catarrhalis. Clinical features andtherapeutic response. Drugs 31(Suppl.3):109-112.

5. Nicotra, B., M. Rivera, J. I. Luman, and R. J. Wallace. 1986.Branhamella catarrhalis as a lower respiratory tract pathogen inpatients with chronic lung disease. Arch.Intern.Med. 146:890-893.

6. Ninane, G., J. Joly, and M. Kraytman. 1978. Bronchopulmonaryinfection due to Branhamella catarrhalis 11 cases assessed bytranstracheal puncture. Br.Med.Jr. 1:276-278.

7. Srinivasan, G., M. J. Raff, W. C. Templeton, S. J. Givens, R. C.Graves, and J. C. Mel. 1981. Branhamella catarrhalis pneumonia. Reportof two cases and review of the literature. Am.Rev. Respir. Dis.123:553-555.

8. West, M., S. L. Berk, and J. K. Smith. 1982. Branhamella catarrhalispneumonia. South.Med. J. 75:1021-1023.

9. Christensen, J. J., and B. Bruun. 1985. Bacteremia caused by abeta-lactamase producing strain of Branhamella catarrhalis. Acta.Pathol.Microbiol. Immunol. Scand. Sect.B 93:273-275.

10. Craig, D. B., and P. A. Wehrle. 1983. Branhamella catarrhalis septicarthritis. J. Rheumatol. 10:985-986.

11. Guthrie, R., K. Bakenhaster, R.Nelson, and R. Woskobnick. 1988.Branhamella catarrhalis sepsis: a case report and review of theliterature. J.Infect.Dis. 158:907-908.

12. Hiroshi, S., E. J. Anaissie, N.Khardori, and G. P. Bodey. 1988.Branhamella catarrhalis septicemia in patients with leukemia. Cancer61:2315-2317.

13. O'Neill, J. H., and P. W. Mathieson. 1987. Meningitis due toBranhamella catarrhalis. Aust. N. Z. J. Med. 17:241-242.

14. Murphy, T. F. 1989. The surface of Branhamella catarrhalis: asystematic approach to the surface antigens of an emerging pathogen.Pediatr. Infect. Dis. J. 8:S75-S77.

15. Van Hare, G. F., P. A. Shurin, C. D. Marchant, N. A. Cartelli, C. E.Johnson, D. Fulton, S. Carlin, and C. H. Kim. Acute otitis media causedby Branhamella catarrhalis: biology and therapy. Rev. Infect. Dis.9:16-27.

16. Jorgensen, J. H., Doern, G. V., Maher, L. A., Howell, A. W., andRedding, J. S., 1990 Antimicrobial resistance among respiratory isolatesof Haemophilus influenza, Moraxella catarrhalis, and Streptococcuspneumoniae in the United States. Antibicrob. Agents Chemother. 34:2075-2080.

17. Schryvers, A. B. and Morris, L. J. 1988 Identification andCharacterization of the transferrin receptor from Neisseriameningitidis. Mol. Microbiol. 2:281-288.

18. Lee, B. C., Schryvers, A. B. Specificity of the lactoferrin andtransferrin receptors in Neisseria gonorrhoeae. Mol. Microbiol. 1988;2-827-9.

19. Schryvers, A. B. Characterization of the human transferrin andlactoferrin receptors in Haemophilus influenzae. Mol. Microbiol. 1988;2: 467-72.

20. Schryvers, A. B. and Lee, B. C. (1988) Comparative analysis of thetransferrin and lactoferrin binding proteins in the familyNeisseriaceae. Can. J. Microbiol. 35, 409-415.

21. Yu, R. and Schryvers, A. B., 1993. The interaction between humantransferrin and transferrin binding protein 2 from Moraxella(Branhamella) catarrhalis differs from that of other human pathogens.Microbiol. Pathogenesis, 15:433-445.

22. O'Hagan, 1992. Clin. Pharmokinet. 22:1.

23. Ulmer et al., 1993. Curr. Opinion Invest. Drugs 2: 983-989.

24. Lockhoff, O., 1991. glycolipds as immunomoclutators: Synthesis andproperits. Chem. Int. Ed. Engl. 30: 1611-1620.

25. Nixon-George, 1990. J. Immunol. 14: 4798-4802.

26. Wallace, R. J. Jr., Nash, D. R., and Steingrube, V. A. 1990.Antibiotic susceptibilites and drug resistance in Moraxella(Branhaemella) catarrhalis. Am. J. Med. 88 (5A): 465-50S.

27. F. M. Ausubel et al., Short protocols in Molecular Biology, GreenePublishing Associates and John Wiley and Sons.

28. Schryvers, A. B., Lee, B. C. 1989. Comparative analysis of thetransferrin and lactoferrin binding proteins in the familyNeisseriaceae. Can. J. Microbiol. 35: 409-415.

29. Legrain, M., V. Mazarin, S. W. Irwin, B. Bouchon, M-J.Quentin-Millet, E. Jacobs, and A. B. Schryvers. 1993, Cloning andcharacterization of Neisseria meningitidis genes encoding thetransferrin-binding proteins Tbp1 and Tbp2. Gene 130: 73-80.

30. Ogunnariwo, J. W., Woo, T. K. W., Lo, R. Y. C., Gonzalez, G. C., andSchryvers, A. B. Characterization of the Pasteurella haemolyticatransferrin receptor genes and the recombinant receptor proteins.Microb. Pathog. 23:273-284 (1997).

31. Yang, Y. P., Myers, L. E., McGuinness, U., Chong, P., Kwok, Y.,Klein, M. H. and Harkness R. E. The major outer membrane protein, C. D,extracted from Moraxella (Branhamella) catarrhalis is a potentialvaccine antigen that induces bactericidal antibodies. FEMS Immun. Med.Microbiol. 17:187-199 (1997).

32. Needleman, S. B., and Wunsch, C. D. 1970, J. Mol Biol. 48:443-453.

33. Sellers, P. H. 1974 On the theory and computation of evolutionarydistances. J. Appl. Math(Siam) 26:787-793.

34. Waterman, M. S., Smith, T. F., and Beyer, W. A. 1976. Advan. Math.20:367-387.

35. Smith, T. F., and Waterman, M. S. 1981 Identification of commonmolecular subsequences. J. Mol. Biol. 147:195-197.

36. Jimenez-Montano, M. and Zamora-Cortina, L. 1981 Evolutionary modelfor the generation of amino acid sequences and its application to thestudy of mammal alpha-hemoglobin chains. Proc. VII Int. BiophysicsCongress, Mexico City.

37. Sobel, E. and Martinez, H. M. 1985 A Multiple Sequence AlignmentProgram. Nucleic Acid Res. 14:363-374.

60 3438 base pairs nucleic acid single linear 1 TATTTTGACA AGCTATACACTAAAATCAAA AATTAATCAC TTTGGTTGGG TGGTTTTAGC 60 AAGCAAATGG TTATTTTGGTAAACAATTAA GTTCTTAAAA ACGATACACG CTCATAAACA 120 GATGGTTTTT GGCATCTGCAATTTGATGCC TGCCTTGTGA TTGGTTGGGG TGTATCGGTG 180 TATCAAAGTG CAAAAGCCAACAGGTGGTCA TTGATGAATC AATCAAAACA AAACAACAAA 240 TCCAAAAAAT CCAAACAAGTATTAAAACTT AGTGCCTTGT CTTTGGGTCT GCTTAACATC 300 ACGCAGGTGG CACTGGCAAACACAACGGCC GATAAGGCGG AGGCAACAGA TAAGACAAAC 360 CTTGTTGTTG TCTTGGATGAAACTGTTGTA ACAGCGAAGA AAAACGCCCG TAAAGCCAAC 420 GAAGTTACAG GGCTTGGTAAGGTGGTCAAA ACTGCCGAGA CCATCAATAA AGAACAAGTG 480 CTAAACATTC GAGACTTAACACGCTATGAC CCTGGCATTG CTGTGGTTGA GCAAGGTCGT 540 GGGGCAAGCT CAGGCTATTCTATTCGTGGT ATGGATAAAA ATCGTGTGGC GGTATTGGTT 600 GATGGCATCA ATCAAGCCCAGCACTATGCC CTACAAGGCC CTGTGGCAGG CAAAAATTAT 660 GCCGCAGGTG GGGCAATCAACGAAATAGAA TACGAAAATG TCCGCTCCGT TGAGATTAGT 720 AAAGGTGCAA ATTCAAGTGAATACGGCTCT GGGGCATTAT CTGGCTCTGT GGCATTTGTT 780 ACCAAAACCG CCGATGACATCATCAAAGAT GGTAAAGATT GGGGCGTGCA GACCAAAACC 840 GCCTATGCCA GTAAAAATAACGCATGGGTT AATTCTGTGG CAGCAGCAGG CAAGGCAGGT 900 TCTTTTAGCG GTCTTATCATCTACACCGAC CGCCGTGGTC AAGAATACAA GGCACATGAT 960 GATGCCTATC AGGGTAGCCAAAGTTTTGAT AGAGCGGTGG CAACCACTGA CCCAAATAAC 1020 CGAACATTTT TAATAGCAAATGAATGTGCC AATGGTAATT ATGAGGCGTG TGCTGCTGGC 1080 GGTCAAACCA AACTTCAAGCCAAGCCAACC AATGTGCGTG ATAAGGTCAA TGTCAAAGAT 1140 TATACAGGTC CTAACCGCCTTATCCCAAAC CCACTCACCC AAGACAGCAA ATCCTTACTG 1200 CTTCGCCCAG GTTATCAGCTAAACGATAAG CACTATGTCG GTGGTGTGTA TGAAATCACC 1260 AAACAAAACT ACGCCATGCAAGATAAAACC GTGCCTGCTT ATCTGACGGT TCATGACATT 1320 GAAAAATCAA GGCTCAGCAACCATGCCCAA GCCAATGGCT ATTATCAAGG CAATAATCTT 1380 GGTGAACGCA TTCGTGATACCATTGGGCCA GATTCAGGTT ATGGCATCAA CTATGCTCAT 1440 GGCGTATTTT ATGATGAAAAACACCAAAAA GACCGCCTAG GGCTTGAATA TGTTTATGAC 1500 AGCAAAGGTG AAAATAAATGGTTTGATGAT GTGCGTGTGT CTTATGATAA GCAAGACATT 1560 ACGCTACGCA GCCAGCTGACCAACACGCAC TGTTCAACCT ATCCGCACAT TGACAAAAAT 1620 TGTACGCCTG ATGTCAATAAACCTTTTTCG GTAAAAGAGG TGGATAACAA TGCCTACAAA 1680 GAACAGCACA ATTTAATCAAAGCCGTCTTT AACAAAAAAA TGGCGTTGGG CAGTACGCAT 1740 CATCACATCA ACCTGCAAGTTGGCTATGAT AAATTCAATT CAAGCCTGAG CCGTGAAGAT 1800 TATCGTTTGG CAACCCATCAGTCTTATGAA AAACTTGATT ACACCCCACC AAGTAACCCT 1860 TTGCCAGATA AGTTTAAGCCCATTTTAGGT TCAAACAACA AACCCATTTG CCTTGATGCT 1920 TATGGTTATG GTCATGACCATCCACAGGCT TGTAACGCCA AAAACAGCAC TTATCAAAAT 1980 TTTGCCATCA AAAAAGGCATAGAGCAATAC AACCAAAAAA CCAATACCGA TAAGATTGAT 2040 TATCAAGCCA TCATTGACCAATATGATAAA CAAAACCCCA ACAGCACCCT AAAACCCTTT 2100 GAGAAAATCA AACAAAGTTTGGGGCAAGAA AAATACAACA AGATAGACGA ACTTGGCTTT 2160 AAAGCTTATA AAGATTTACGCAACGAATGG GCGGGTTGGA CTAATGACAA CAGCCAACAA 2220 AATGCCAATA AAGGCACGGATAATATCTAT CAGCCAAATC AAGCAACTGT GGTCAAAGAT 2280 GACAAATGTA AATATAGCGAGACCAACAGC TATGCTGATT GCTCAACCAC TGCGCACATC 2340 AGTGGTGATA ATTATTTCATCGCTTTAAAA GACAACATGA CCATCAATAA ATATGTTGAT 2400 TTGGGGCTGG GTGCTCGCTATGACAGAATC AAACACAAAT CTGATGTGCC TTTGGTAGAC 2460 AACAGTGCCA GCAACCAGCTGTCTTGGAAT TTTGGCGTGG TCGTCAAGCC CACCAATTGG 2520 CTGGACATCG CTTATAGAAGCTCGCAAGGC TTTCGCATGC CAAGTTTTTC TGAAATGTAT 2580 GGCGAACGCT TTGGCGTAACCATCGGTAAA GGCACGCAAC ATGGCTGTAA GGGTCTTTAT 2640 TACATTTGTC AGCAGACTGTCCATCAAACC AAGCTAAAAC CTGAAAAATC CTTTAACCAA 2700 GAAATCGGAG CGACTTTACATAACCACTTA GGCAGTCTTG AGGTTAGTTA TTTTAAAAAT 2760 CGCTATACCG ATTTGATTGTTGGTAAAAGT GAAGAGATTA GAACCCTAAC CCAAGGTGAT 2820 AATGCAGGCA AACAGCGTGGTAAAGGTGAT TTGGGCTTTC ATAATGGACA AGATGCTGAT 2880 TTGACAGGCA TTAACATTCTTGGCAGACTT GACCTAAACG CTGTCAATAG TCGCCTTCCC 2940 TATGGATTAT ACTCAACACTGGCTTATAAC AAAGTTGATG TTAAAGGAAA AACCTTAAAC 3000 CCAACTTTGG CAGGAACAAACATACTGTTT GATGCCATCC AGCCATCTCG TTATGTGGTG 3060 GGGCTTGGCT ATGATGCCCCAAGCCAAAAA TGGGGAGCAA ACGCCATATT TACCCATTCT 3120 GATGCCAAAA ATCCAAGCGAGCTTTTGGCA GATAAGAACT TAGGTAATGG CAACATTCAA 3180 ACAAAACAAG CCACCAAAGCAAAATCCACG CCGTGGCAAA CACTTGATTT GTCAGGTTAT 3240 GTAAACATAA AAGATAATTTTACCTTGCGT GCTGGCGTGT ACAATGTATT TAATACCTAT 3300 TACACCACTT GGGAGGCTTTACGCCAAACA GCAGAAGGGG CGGTCAATCA GCATACAGGA 3360 CTGAGCCAAG ATAAGCATTATGGTCGCTAT GCCGCTCCTG GACGCAATTA CCAATTGGCA 3420 CTTGAAATGA AGTTTTAA3438 3222 base pairs nucleic acid single linear 2 ATGAATCAAT CAAAACAAAACAACAAATCC AAAAAATCCA AACAAGTATT AAAACTTAGT 60 GCCTTGTCTT TGGGTCTGCTTAACATCACG CAGGTGGCAC TGGCAAACAC AACGGCCGAT 120 AAGGCGGAGG CAACAGATAAGACAAACCTT GTTGTTGTCT TGGATGAAAC TGTTGTAACA 180 GCGAAGAAAA ACGCCCGTAAAGCCAACGAA GTTACAGGGC TTGGTAAGGT GGTCAAAACT 240 GCCGAGACCA TCAATAAAGAACAAGTGCTA AACATTCGAG ACTTAACACG CTATGACCCT 300 GGCATTGCTG TGGTTGAGCAAGGTCGTGGG GCAAGCTCAG GCTATTCTAT TCGTGGTATG 360 GATAAAAATC GTGTGGCGGTATTGGTTGAT GGCATCAATC AAGCCCAGCA CTATGCCCTA 420 CAAGGCCCTG TGGCAGGCAAAAATTATGCC GCAGGTGGGG CAATCAACGA AATAGAATAC 480 GAAAATGTCC GCTCCGTTGAGATTAGTAAA GGTGCAAATT CAAGTGAATA CGGCTCTGGG 540 GCATTATCTG GCTCTGTGGCATTTGTTACC AAAACCGCCG ATGACATCAT CAAAGATGGT 600 AAAGATTGGG GCGTGCAGACCAAAACCGCC TATGCCAGTA AAAATAACGC ATGGGTTAAT 660 TCTGTGGCAG CAGCAGGCAAGGCAGGTTCT TTTAGCGGTC TTATCATCTA CACCGACCGC 720 CGTGGTCAAG AATACAAGGCACATGATGAT GCCTATCAGG GTAGCCAAAG TTTTGATAGA 780 GCGGTGGCAA CCACTGACCCAAATAACCGA ACATTTTTAA TAGCAAATGA ATGTGCCAAT 840 GGTAATTATG AGGCGTGTGCTGCTGGCGGT CAAACCAAAC TTCAAGCCAA GCCAACCAAT 900 GTGCGTGATA AGGTCAATGTCAAAGATTAT ACAGGTCCTA ACCGCCTTAT CCCAAACCCA 960 CTCACCCAAG ACAGCAAATCCTTACTGCTT CGCCCAGGTT ATCAGCTAAA CGATAAGCAC 1020 TATGTCGGTG GTGTGTATGAAATCACCAAA CAAAACTACG CCATGCAAGA TAAAACCGTG 1080 CCTGCTTATC TGACGGTTCATGACATTGAA AAATCAAGGC TCAGCAACCA TGCCCAAGCC 1140 AATGGCTATT ATCAAGGCAATAATCTTGGT GAACGCATTC GTGATACCAT TGGGCCAGAT 1200 TCAGGTTATG GCATCAACTATGCTCATGGC GTATTTTATG ATGAAAAACA CCAAAAAGAC 1260 CGCCTAGGGC TTGAATATGTTTATGACAGC AAAGGTGAAA ATAAATGGTT TGATGATGTG 1320 CGTGTGTCTT ATGATAAGCAAGACATTACG CTACGCAGCC AGCTGACCAA CACGCACTGT 1380 TCAACCTATC CGCACATTGACAAAAATTGT ACGCCTGATG TCAATAAACC TTTTTCGGTA 1440 AAAGAGGTGG ATAACAATGCCTACAAAGAA CAGCACAATT TAATCAAAGC CGTCTTTAAC 1500 AAAAAAATGG CGTTGGGCAGTACGCATCAT CACATCAACC TGCAAGTTGG CTATGATAAA 1560 TTCAATTCAA GCCTGAGCCGTGAAGATTAT CGTTTGGCAA CCCATCAGTC TTATGAAAAA 1620 CTTGATTACA CCCCACCAAGTAACCCTTTG CCAGATAAGT TTAAGCCCAT TTTAGGTTCA 1680 AACAACAAAC CCATTTGCCTTGATGCTTAT GGTTATGGTC ATGACCATCC ACAGGCTTGT 1740 AACGCCAAAA ACAGCACTTATCAAAATTTT GCCATCAAAA AAGGCATAGA GCAATACAAC 1800 CAAAAAACCA ATACCGATAAGATTGATTAT CAAGCCATCA TTGACCAATA TGATAAACAA 1860 AACCCCAACA GCACCCTAAAACCCTTTGAG AAAATCAAAC AAAGTTTGGG GCAAGAAAAA 1920 TACAACAAGA TAGACGAACTTGGCTTTAAA GCTTATAAAG ATTTACGCAA CGAATGGGCG 1980 GGTTGGACTA ATGACAACAGCCAACAAAAT GCCAATAAAG GCACGGATAA TATCTATCAG 2040 CCAAATCAAG CAACTGTGGTCAAAGATGAC AAATGTAAAT ATAGCGAGAC CAACAGCTAT 2100 GCTGATTGCT CAACCACTGCGCACATCAGT GGTGATAATT ATTTCATCGC TTTAAAAGAC 2160 AACATGACCA TCAATAAATATGTTGATTTG GGGCTGGGTG CTCGCTATGA CAGAATCAAA 2220 CACAAATCTG ATGTGCCTTTGGTAGACAAC AGTGCCAGCA ACCAGCTGTC TTGGAATTTT 2280 GGCGTGGTCG TCAAGCCCACCAATTGGCTG GACATCGCTT ATAGAAGCTC GCAAGGCTTT 2340 CGCATGCCAA GTTTTTCTGAAATGTATGGC GAACGCTTTG GCGTAACCAT CGGTAAAGGC 2400 ACGCAACATG GCTGTAAGGGTCTTTATTAC ATTTGTCAGC AGACTGTCCA TCAAACCAAG 2460 CTAAAACCTG AAAAATCCTTTAACCAAGAA ATCGGAGCGA CTTTACATAA CCACTTAGGC 2520 AGTCTTGAGG TTAGTTATTTTAAAAATCGC TATACCGATT TGATTGTTGG TAAAAGTGAA 2580 GAGATTAGAA CCCTAACCCAAGGTGATAAT GCAGGCAAAC AGCGTGGTAA AGGTGATTTG 2640 GGCTTTCATA ATGGACAAGATGCTGATTTG ACAGGCATTA ACATTCTTGG CAGACTTGAC 2700 CTAAACGCTG TCAATAGTCGCCTTCCCTAT GGATTATACT CAACACTGGC TTATAACAAA 2760 GTTGATGTTA AAGGAAAAACCTTAAACCCA ACTTTGGCAG GAACAAACAT ACTGTTTGAT 2820 GCCATCCAGC CATCTCGTTATGTGGTGGGG CTTGGCTATG ATGCCCCAAG CCAAAAATGG 2880 GGAGCAAACG CCATATTTACCCATTCTGAT GCCAAAAATC CAAGCGAGCT TTTGGCAGAT 2940 AAGAACTTAG GTAATGGCAACATTCAAACA AAACAAGCCA CCAAAGCAAA ATCCACGCCG 3000 TGGCAAACAC TTGATTTGTCAGGTTATGTA AACATAAAAG ATAATTTTAC CTTGCGTGCT 3060 GGCGTGTACA ATGTATTTAATACCTATTAC ACCACTTGGG AGGCTTTACG CCAAACAGCA 3120 GAAGGGGCGG TCAATCAGCATACAGGACTG AGCCAAGATA AGCATTATGG TCGCTATGCC 3180 GCTCCTGGAC GCAATTACCAATTGGCACTT GAAATGAAGT TT 3222 2247 base pairs nucleic acid single linear3 GTAAATTTGC CGTATTTTGT CTATCATAAA TGCATTTATC AAATGCTCAA ATAAATACGC 60AAATGCACAT TGTCAGCATG CCAAAATAGG CATCAACAGA CTTTTTTAGA TAATACCATC 120AACCCATCAG AGGATTATTT TATGAAACAC ATTCCTTTAA CCACACTGTG TGTGGCAATC 180TCTGCCGTCT TATTAACCGC TTGTGGTGGC AGTGGTGGTT CAAATCCACC TGCTCCTACG 240CCCATTCCAA ATGCTAGCGG TTCAGGTAAT ACTGGCAACA CTGGTAATGC TGGCGGTACT 300GATAATACAG CCAATGCAGG TAATACAGGC GGTACAAACT CTGGTACAGG CAGTGCCAAC 360ACACCAGAGC CAAAATATCA AGATGTACCA ACTGAGAAAA ATGAAAAAGA TAAAGTTTCA 420TCCATTCAAG AACCTGCCAT GGGTTATGGC ATGGCTTTGA GTAAAATTAA TCTACACAAC 480CGACAAGACA CGCCATTAGA TGAAAAAAAT ATCATTACCT TAGACGGTAA AAAACAAGTT 540GCAGAAGGTA AAAAATCGCC ATTGCCATTT TCGTTAGATG TAGAAAATAA ATTGCTTGAT 600GGCTATATAG CAAAAATGAA TGTAGCGGAT AAAAATGCCA TTGGTGACAG AATTAAGAAA 660GGTAATAAAG AAATCTCCGA TGAAGAACTT GCCAAACAAA TCAAAGAAGC TGTGCGTAAA 720AGCCATGAGT TTCAGCAAGT ATTATCATCA CTGGAAAACA AAATTTTTCA TTCAAATGAC 780GGAACAACCA AAGCAACCAC ACGAGATTTA AAATATGTTG ATTATGGTTA CTACTTGGCG 840AATGATGGCA ATTATCTAAC CGTCAAAACA GACAAACTTT GGAATTTAGG CCCTGTGGGT 900GGTGTGTTTT ATAATGGCAC AACGACCGCC AAAGAGTTGC CCACACAAGA TGCGGTCAAA 960TATAAAGGAC ATTGGGACTT TATGACCGAT GTTGCCAACA GAAGAAACCG ATTTAGCGAA 1020GTGAAAGAAA ACTCTCAAGC AGGCTGGTAT TATGGAGCAT CTTCAAAAGA TGAATACAAC 1080CGCTTATTAA CTAAAGAAGA CTCTGCCCCT GATGGTCATA GCGGTGAATA TGGCCATAGC 1140AGTGAGTTTA CTGTTAATTT TAAGGAAAAA AAATTAACAG GTAAGCTGTT TAGTAACCTA 1200CAAGACCGCC ATAAGGGCAA TGTTACAAAA ACCGAACGCT ATGACATCGA TGCCAATATC 1260CACGGCAACC GCTTCCGTGG CAGTGCCACC GCAAGCAATA AAAATGACAC AAGCAAACAC 1320CCCTTTACCA GTGATGCCAA CAATAGGCTA GAAGGTGGTT TTTATGGGCC AAAAGGCGAG 1380GAGCTGGCAG GTAAATTCTT AACCAATGAC AACAAACTCT TTGGCGTCTT TGGTGCTAAA 1440CGAGAGAGTA AAGCTGAGGA AAAAACCGAA GCCATCTTAG ATGCCTATGC ACTTGGGACA 1500TTTAATACAA GTAACGCAAC CACATTCACC CCATTTACCG AAAAACAACT GGATAACTTT 1560GGCAATGCCA AAAAATTGGT CTTAGGTTCT ACCGTCATTG ATTTGGTGCC TACTGATGCC 1620ACCAAAAATG AATTCACCAA AGACAAGCCA GAGTCTGCCA CAAACGAAGC GGGCGAGACT 1680TTGATGGTGA ATGATGAAGT TAGCGTCAAA ACCTATGGCA AAAACTTTGA ATACCTAAAA 1740TTTGGTGAGC TTAGTATCGG TGGTAGCCAT AGCGTCTTTT TACAAGGCGA ACGCACCGCT 1800ACCACAGGCG AGAAAGCCGT ACCAACCACA GGCACAGCCA AATATTTGGG GAACTGGGTA 1860GGATACATCA CAGGAAAGGA CACAGGAACG GGCACAGGAA AAAGCTTTAC CGATGCCCAA 1920GATGTTGCTG ATTTTGACAT TGATTTTGGA AATAAATCAG TCAGCGGTAA ACTTATCACC 1980AAAGGCCGCC AAGACCCTGT ATTTAGCATC ACAGGTCAAA TCGCAGGCAA TGGCTGGACA 2040GGCACAGCCA GCACCACCAA AGCGGACGCA GGAGGCTACA AGATAGATTC TAGCAGTACA 2100GGCAAATCCA TCGTCATCAA AGATGCCAAT GTTACAGGGG GCTTTTATGG TCCAAATGCA 2160AACGAGATGG GCGGGTCATT TACACACAAC GCCGATGACA GCAAAGCCTC TGTGGTCTTT 2220GGCACAAAAA GACAACAAGA AGTTAAG 2247 2106 base pairs nucleic acid singlelinear 4 ATGAAACACA TTCCTTTAAC CACACTGTGT GTGGCAATCT CTGCCGTCTTATTAACCGCT 60 TGTGGTGGCA GTGGTGGTTC AAATCCACCT GCTCCTACGC CCATTCCAAATGCTAGCGGT 120 TCAGGTAATA CTGGCAACAC TGGTAATGCT GGCGGTACTG ATAATACAGCCAATGCAGGT 180 AATACAGGCG GTACAAACTC TGGTACAGGC AGTGCCAACA CACCAGAGCCAAAATATCAA 240 GATGTACCAA CTGAGAAAAA TGAAAAAGAT AAAGTTTCAT CCATTCAAGAACCTGCCATG 300 GGTTATGGCA TGGCTTTGAG TAAAATTAAT CTACACAACC GACAAGACACGCCATTAGAT 360 GAAAAAAATA TCATTACCTT AGACGGTAAA AAACAAGTTG CAGAAGGTAAAAAATCGCCA 420 TTGCCATTTT CGTTAGATGT AGAAAATAAA TTGCTTGATG GCTATATAGCAAAAATGAAT 480 GTAGCGGATA AAAATGCCAT TGGTGACAGA ATTAAGAAAG GTAATAAAGAAATCTCCGAT 540 GAAGAACTTG CCAAACAAAT CAAAGAAGCT GTGCGTAAAA GCCATGAGTTTCAGCAAGTA 600 TTATCATCAC TGGAAAACAA AATTTTTCAT TCAAATGACG GAACAACCAAAGCAACCACA 660 CGAGATTTAA AATATGTTGA TTATGGTTAC TACTTGGCGA ATGATGGCAATTATCTAACC 720 GTCAAAACAG ACAAACTTTG GAATTTAGGC CCTGTGGGTG GTGTGTTTTATAATGGCACA 780 ACGACCGCCA AAGAGTTGCC CACACAAGAT GCGGTCAAAT ATAAAGGACATTGGGACTTT 840 ATGACCGATG TTGCCAACAG AAGAAACCGA TTTAGCGAAG TGAAAGAAAACTCTCAAGCA 900 GGCTGGTATT ATGGAGCATC TTCAAAAGAT GAATACAACC GCTTATTAACTAAAGAAGAC 960 TCTGCCCCTG ATGGTCATAG CGGTGAATAT GGCCATAGCA GTGAGTTTACTGTTAATTTT 1020 AAGGAAAAAA AATTAACAGG TAAGCTGTTT AGTAACCTAC AAGACCGCCATAAGGGCAAT 1080 GTTACAAAAA CCGAACGCTA TGACATCGAT GCCAATATCC ACGGCAACCGCTTCCGTGGC 1140 AGTGCCACCG CAAGCAATAA AAATGACACA AGCAAACACC CCTTTACCAGTGATGCCAAC 1200 AATAGGCTAG AAGGTGGTTT TTATGGGCCA AAAGGCGAGG AGCTGGCAGGTAAATTCTTA 1260 ACCAATGACA ACAAACTCTT TGGCGTCTTT GGTGCTAAAC GAGAGAGTAAAGCTGAGGAA 1320 AAAACCGAAG CCATCTTAGA TGCCTATGCA CTTGGGACAT TTAATACAAGTAACGCAACC 1380 ACATTCACCC CATTTACCGA AAAACAACTG GATAACTTTG GCAATGCCAAAAAATTGGTC 1440 TTAGGTTCTA CCGTCATTGA TTTGGTGCCT ACTGATGCCA CCAAAAATGAATTCACCAAA 1500 GACAAGCCAG AGTCTGCCAC AAACGAAGCG GGCGAGACTT TGATGGTGAATGATGAAGTT 1560 AGCGTCAAAA CCTATGGCAA AAACTTTGAA TACCTAAAAT TTGGTGAGCTTAGTATCGGT 1620 GGTAGCCATA GCGTCTTTTT ACAAGGCGAA CGCACCGCTA CCACAGGCGAGAAAGCCGTA 1680 CCAACCACAG GCACAGCCAA ATATTTGGGG AACTGGGTAG GATACATCACAGGAAAGGAC 1740 ACAGGAACGG GCACAGGAAA AAGCTTTACC GATGCCCAAG ATGTTGCTGATTTTGACATT 1800 GATTTTGGAA ATAAATCAGT CAGCGGTAAA CTTATCACCA AAGGCCGCCAAGACCCTGTA 1860 TTTAGCATCA CAGGTCAAAT CGCAGGCAAT GGCTGGACAG GCACAGCCAGCACCACCAAA 1920 GCGGACGCAG GAGGCTACAA GATAGATTCT AGCAGTACAG GCAAATCCATCGTCATCAAA 1980 GATGCCAATG TTACAGGGGG CTTTTATGGT CCAAATGCAA ACGAGATGGGCGGGTCATTT 2040 ACACACAACG CCGATGACAG CAAAGCCTCT GTGGTCTTTG GCACAAAAAGACAACAAGAA 2100 GTTAAG 2106 3660 base pairs nucleic acid single linear 5AATTGATACA AAATGGTTTG TATTATCACT TGTATTTGTA TTATAATTTT ACTTATTTTT 60ACAAACTATA CACTAAAATC AAAAATTAAT CACTTTGGTT GGGTGGTTTT AGCAAGCAAA 120TGGTTATTTT GGTAAACAAT TAAGTTCTTA AAAACGATAC ACGCTCATAA ACAGATGGTT 180TTTGGCATCT TCAATTTGAT GCCTGCCTTG TGATTGGTTG GGGGTGTATT GATGTATCCA 240AGTACAAAAG CCAACAGGTG GTCATTGATG AATCAATCCA AAAAATCCAA AAAATCCAAA 300CAAGTATTAA AACTTAGTGC CTTGTCTTTG GGTCTGCTTA ACATCACGCA GGTGGCACTG 360GCAAACACAA CGGCCGATAA GGCGGAGGCA ACAGATAAGA CAAACCTTGT TGTTGTCTTG 420GATGAAACTG TTGTAACAGC GAAGAAAAAC GCCCGTAAAG CCAACGAAGT TACAGGGCTT 480GGTAAGGTGG TCAAAACTGC CGAGACCATC AATAAAGAAC AAGTGCTAAA CATTCGAGAC 540TTAACACGCT ATGACCCTGG CATTGCTGTG GTTGAGCAAG GTCGTGGGGC AAGCTCAGGC 600TATTCTATTC GTGGTATGGA TAAAAATCGT GTGGCGGTAT TGGTTGATGG CATCAATCAA 660GCCCAGCACT ATGCCCTACA AGGCCCTGTG GCAGGCAAAA ATTATGCCGC AGGTGGGGCA 720ATCAACGAAA TAGAATACGA AAATGTCCGC TCCGTTGAGA TTAGTAAAGG TGCAAATTCA 780AGTGAATACG GCTCTGGGGC ATTATCTGGC TCTGTGGCAT TTGTTACCAA AACCGCCGAT 840GACATCATCA AAGATGGTAA AGATTGGGGC GTGCAGACCA AAACCGCCTA TGCCAGTAAA 900AATAACGCAT GGGTTAATTC TGTGGCAGCA GCAGGCAAGG CAGGTTCTTT TAGCGGTCTT 960ATCATCTACA CCGACCGCCG TGGTCAAGAA TACAAGGCAC ATGATGATGC CTATCAGGGT 1020AGCCAAAGTT TTGATAGAGC GGTGGCAACC ACTGACCCAA ATAACCCAAA ATTTTTAATA 1080GCAAATGAAT GTGCCAATGG TAATTATGAG GCGTGTGCTG CTGGCGGTCA AACCAAACTC 1140CAAGCTAAGC CAACCAATGT GCGTGATAAG GTCAATGTCA AAGATTATAC AGGTCCTAAC 1200CGCCTTATCC CAAACCCACT CACCCAAGAC AGCAAATCCT TACTGCTTCG CCCAGGTTAT 1260CAGCTAAACG ATAAGCACTA TGTCGGTGGT GTGTATGAAA TCACCAAACA AAACTACGCC 1320ATGCAAGATA AAACCGTGCC TGCTTATCTG ACGGTTCATG ACATTGAAAA ATCAAGGCTC 1380AGCAACCATG GCCAAGCCAA TGGCTATTAT CAAGGCAATA ACCTTGGTGA ACGCATTCGT 1440GATGCCATTG GGGCAAATTC AGGTTATGGC ATCAACTATG CTCATGGCGT ATTTTATGAC 1500GAAAAACACC AAAAAGACCG CCTAGGGCTT GAATATGTTT ATGACAGCAA AGGTGAAAAT 1560AAATGGTTTG ATGATGTGCG TGTGTCTTAT GACAAGCAAG ACATTACGCT ACGTAGCCAG 1620CTGACCAACA CGCACTGTTC AACCTATCCG CACATTGACA AAAATTGTAC GCCTGATGTC 1680AATAAACCTT TTTCGGTAAA AGAGGTGGAT AACAATGCCT ACAAAGAACA GCACAATTTA 1740ATCAAAGCCG TCTTTAACAA AAAAATGGCA TTGGGCAATA CGCATCATCA CATCAATCTG 1800CAAGTTGGCT ATGATAAATT CAATTCAAGC CTTAGCCGTG AAGATTATCG TTTGGCAACC 1860CATCAATCTT ATCAAAAACT TGATTACACC CCACCAAGTA ACCCTTTGCC AGATAAGTTT 1920AAGCCCATTT TAGGTTCAAA CAACAGACCC ATTTGCCTTG ATGCTTATGG TTATGGTCAT 1980GACCATCCAC AGGCTTGTAA CGCCAAAAAC AGCACTTATC AAAACTTTGC CATCAAAAAA 2040GGCATAGAGC AATACAACCA AACCAATACC GATAAGATTG ATTATCAAGC CGTCATTGAC 2100CAATATGATA AACAAAACCC CAACAGCACC CTAAAACCCT TTGAGAAAAT CAAACAAAGT 2160TTGGGGCAAG AAAAATACGA CGAGATAGAC AGACTGGGCT TTAATGCTTA TAAAGATTTA 2220CGCAACGAAT GGGCGGGTTG GACTAATGAC AACAGCCAAC AAAACGCCAA TAAAGGCACG 2280GATAATATCT ATCAGCCAAA TCAAGCAACT GTGGTCAAAG ATGACAAATG TAAATATAGC 2340GAGACCAACA GCTATGCTGA TTGCTCAACC ACTCGCCACA TCAGCGGTGA TAATTATTTC 2400ATCGCTTTAA AAGACAACAT GACCATCAAT AAATATGTTG ATTTGGGGCT GGGTGCTCGC 2460TATGACAGAA TCAAACACAA ATCTGATGTG CCTTTGGTAG ACAACAGTGC CAGCAACCAG 2520CTGTCTTGGA ATTTTGGCGT GGTCGTCAAG CCCACCAATT GGCTGGACAT CGCTTATAGA 2580AGCTCGCAAG GCTTTCGCAT GCCAAGTTTT TCTGAAATGT ATGGCGAACG CTTTGGCGTA 2640ACCATCGGTA AAGGCACGCA ACATGGCTGT AAGGGTCTTT ATTACATTTG TCAGCAGACT 2700GTCCATCAAA CCAAGCTAAA ACCTGAAAAA TCCTTTAACC AAGAAATCGG AGCGACTTTA 2760CATAACCACT TAGGCAGTCT TGAGGTTAGT TATTTTAAAA ATCGCTATAC CGATTTGATT 2820GTTGGTAAAA GTGAAGAGAT TAGAACCCTA ACCCAAGGTG ATAATGCAGG CAAACAGCGT 2880GGTAAAGGTG ATTTGGGCTT TCATAATGGG CAAGATGCTG ATTTGACAGG CATTAACATT 2940CTTGGCAGAC TTGACCTAAA CGCTGTCAAT AGTCGCCTTC CCTATGGATT ATACTCAACA 3000CTGGCTTATA ACAAAGTTGA TGTTAAAGGA AAAACCTTAA ACCCAACTTT GGCAGGAACA 3060AACATACTGT TTGATGCCAT TCAGCCATCT CGTTATGTGG TGGGGCTTGG CTATGATGCC 3120CCAAGCCAAA AATGGGGAGC AAACGCCATA TTTACCCATT CTGATGCCAA AAATCCAAGC 3180GAGCTTTTGG CAGATAAGAA CTTAGGTAAT GGCAACAATC AAACAAAACA AGCCACCAAA 3240GCAAAATCCA CGCCGTGGCA AACACTTGAT TTGTCAGGTT ATGTAAACAT AAAAGATAAT 3300TTTACCTTGC GTGCTGGCGT GTACAATGTA TTTAATACCT ATTACACCAC TTGGGAGGCT 3360TTACGCCAAA CAGCAGAAGG GGCGGTCAAT CAGCATACAG GACTGAGCCA AGATAAGCAT 3420TATGGTCGCT ATGCCGCTCC TGGACGCAAT TACCAATTGG CACTTGAAAT GAAGTTTTAA 3480CCAGTGGCTT TGATGTGATC ATGCCAAATC CCAATCAACC AATGAATAAA GCCCCCATCT 3540ACCATGAGGG CTTTATTTTA TCATCGCTGA GTATGCTCTT AGCGGTCATC ACTCAGATTA 3600GTCATTAATT TATTAGCGAT TAATTTATTA GTAATCACGC TGCTCTTTGA TGATTTTAAG 36603210 base pairs nucleic acid single linear 6 ATGAATCAAT CCAAAAAATCCAAAAAATCC AAACAAGTAT TAAAACTTAG TGCCTTGTCT 60 TTGGGTCTGC TTAACATCACGCAGGTGGCA CTGGCAAACA CAACGGCCGA TAAGGCGGAG 120 GCAACAGATA AGACAAACCTTGTTGTTGTC TTGGATGAAA CTGTTGTAAC AGCGAAGAAA 180 AACGCCCGTA AAGCCAACGAAGTTACAGGG CTTGGTAAGG TGGTCAAAAC TGCCGAGACC 240 ATCAATAAAG AACAAGTGCTAAACATTCGA GACTTAACAC GCTATGACCC TGGCATTGCT 300 GTGGTTGAGC AAGGTCGTGGGGCAAGCTCA GGCTATTCTA TTCGTGGTAT GGATAAAAAT 360 CGTGTGGCGG TATTGGTTGATGGCATCAAT CAAGCCCAGC ACTATGCCCT ACAAGGCCCT 420 GTGGCAGGCA AAAATTATGCCGCAGGTGGG GCAATCAACG AAATAGAATA CGAAAATGTC 480 CGCTCCGTTG AGATTAGTAAAGGTGCAAAT TCAAGTGAAT ACGGCTCTGG GGCATTATCT 540 GGCTCTGTGG CATTTGTTACCAAAACCGCC GATGACATCA TCAAAGATGG TAAAGATTGG 600 GGCGTGCAGA CCAAAACCGCCTATGCCAGT AAAAATAACG CATGGGTTAA TTCTGTGGCA 660 GCAGCAGGCA AGGCAGGTTCTTTTAGCGGT CTTATCATCT ACACCGACCG CCGTGGTCAA 720 GAATACAAGG CACATGATGATGCCTATCAG GGTAGCCAAA GTTTTGATAG AGCGGTGGCA 780 ACCACTGACC CAAATAACCCAAAATTTTTA ATAGCAAATG AATGTGCCAA TGGTAATTAT 840 GAGGCGTGTG CTGCTGGCGGTCAAACCAAA CTCCAAGCTA AGCCAACCAA TGTGCGTGAT 900 AAGGTCAATG TCAAAGATTATACAGGTCCT AACCGCCTTA TCCCAAACCC ACTCACCCAA 960 GACAGCAAAT CCTTACTGCTTCGCCCAGGT TATCAGCTAA ACGATAAGCA CTATGTCGGT 1020 GGTGTGTATG AAATCACCAAACAAAACTAC GCCATGCAAG ATAAAACCGT GCCTGCTTAT 1080 CTGACGGTTC ATGACATTGAAAAATCAAGG CTCAGCAACC ATGGCCAAGC CAATGGCTAT 1140 TATCAAGGCA ATAACCTTGGTGAACGCATT CGTGATGCCA TTGGGGCAAA TTCAGGTTAT 1200 GGCATCAACT ATGCTCATGGCGTATTTTAT GACGAAAAAC ACCAAAAAGA CCGCCTAGGG 1260 CTTGAATATG TTTATGACAGCAAAGGTGAA AATAAATGGT TTGATGATGT GCGTGTGTCT 1320 TATGACAAGC AAGACATTACGCTACGTAGC CAGCTGACCA ACACGCACTG TTCAACCTAT 1380 CCGCACATTG ACAAAAATTGTACGCCTGAT GTCAATAAAC CTTTTTCGGT AAAAGAGGTG 1440 GATAACAATG CCTACAAAGAACAGCACAAT TTAATCAAAG CCGTCTTTAA CAAAAAAATG 1500 GCATTGGGCA ATACGCATCATCACATCAAT CTGCAAGTTG GCTATGATAA ATTCAATTCA 1560 AGCCTTAGCC GTGAAGATTATCGTTTGGCA ACCCATCAAT CTTATCAAAA ACTTGATTAC 1620 ACCCCACCAA GTAACCCTTTGCCAGATAAG TTTAAGCCCA TTTTAGGTTC AAACAACAGA 1680 CCCATTTGCC TTGATGCTTATGGTTATGGT CATGACCATC CACAGGCTTG TAACGCCAAA 1740 AACAGCACTT ATCAAAACTTTGCCATCAAA AAAGGCATAG AGCAATACAA CCAAACCAAT 1800 ACCGATAAGA TTGATTATCAAGCCGTCATT GACCAATATG ATAAACAAAA CCCCAACAGC 1860 ACCCTAAAAC CCTTTGAGAAAATCAAACAA AGTTTGGGGC AAGAAAAATA CGACGAGATA 1920 GACAGACTGG GCTTTAATGCTTATAAAGAT TTACGCAACG AATGGGCGGG TTGGACTAAT 1980 GACAACAGCC AACAAAACGCCAATAAAGGC ACGGATAATA TCTATCAGCC AAATCAAGCA 2040 ACTGTGGTCA AAGATGACAAATGTAAATAT AGCGAGACCA ACAGCTATGC TGATTGCTCA 2100 ACCACTCGCC ACATCAGCGGTGATAATTAT TTCATCGCTT TAAAAGACAA CATGACCATC 2160 AATAAATATG TTGATTTGGGGCTGGGTGCT CGCTATGACA GAATCAAACA CAAATCTGAT 2220 GTGCCTTTGG TAGACAACAGTGCCAGCAAC CAGCTGTCTT GGAATTTTGG CGTGGTCGTC 2280 AAGCCCACCA ATTGGCTGGACATCGCTTAT AGAAGCTCGC AAGGCTTTCG CATGCCAAGT 2340 TTTTCTGAAA TGTATGGCGAACGCTTTGGC GTAACCATCG GTAAAGGCAC GCAACATGGC 2400 TGTAAGGGTC TTTATTACATTTGTCAGCAG ACTGTCCATC AAACCAAGCT AAAACCTGAA 2460 AAATCCTTTA ACCAAGAAATCGGAGCGACT TTACATAACC ACTTAGGCAG TCTTGAGGTT 2520 AGTTATTTTA AAAATCGCTATACCGATTTG ATTGTTGGTA AAAGTGAAGA GATTAGAACC 2580 CTAACCCAAG GTGATAATGCAGGCAAACAG CGTGGTAAAG GTGATTTGGG CTTTCATAAT 2640 GGGCAAGATG CTGATTTGACAGGCATTAAC ATTCTTGGCA GACTTGACCT AAACGCTGTC 2700 AATAGTCGCC TTCCCTATGGATTATACTCA ACACTGGCTT ATAACAAAGT TGATGTTAAA 2760 GGAAAAACCT TAAACCCAACTTTGGCAGGA ACAAACATAC TGTTTGATGC CATTCAGCCA 2820 TCTCGTTATG TGGTGGGGCTTGGCTATGAT GCCCCAAGCC AAAAATGGGG AGCAAACGCC 2880 ATATTTACCC ATTCTGATGCCAAAAATCCA AGCGAGCTTT TGGCAGATAA GAACTTAGGT 2940 AATGGCAACA ATCAAACAAAACAAGCCACC AAAGCAAAAT CCACGCCGTG GCAAACACTT 3000 GATTTGTCAG GTTATGTAAACATAAAAGAT AATTTTACCT TGCGTGCTGG CGTGTACAAT 3060 GTATTTAATA CCTATTACACCACTTGGGAG GCTTTACGCC AAACAGCAGA AGGGGCGGTC 3120 AATCAGCATA CAGGACTGAGCCAAGATAAG CATTATGGTC GCTATGCCGC TCCTGGACGC 3180 AATTACCAAT TGGCACTTGAAATGAAGTTT 3210 3435 base pairs nucleic acid single linear 7 CCTAGGGCTGACAGTAACAA CACTTTATAC AGCACATCAT TGATTTATTA CCCAAATGCC 60 ACACGCTATTATCTTTTGGG GGCAGACTTT TATGATGAAA AAGTGCCACA AGACCCATCT 120 GACAGCTATGAGCGTCGTGG CATACGCACA GCTTGGGGGC AAGAATGGGC GGGCGGTCTT 180 TCAAGCCGTGCCCAAATCAG CATCAACAAA CGCCATTACC AAGGAGCAAA CCTAACCAGC 240 GGTGGACAAATTCGCCAGGA TAAACAGATG CAAGCGTCTT TATCGCTTTG GCACAGAGAC 300 ATTCACAAATGGGGCATCAC GCCACGGCTG ACCATCAGCA CAAACATCAA TAAAAGCAAT 360 GACATCAAGGCAAATTATCA CAAAAATCAA ATGTTTGTTG AGTTTAGTCG CATTTTTTGA 420 TGGGATAAGCATGCCCTACT TTTGTTTTTT GTAAAAAAAT GTACCATCAT AGACAATATC 480 AAGAAAAAATCAAGAAAAAA GATTACAAAT TTAATGATAA TTGTTATTGT TTATGTTATT 540 ATTTATCAATGTAAATTTGC CGTATTTTGT CCATCATAAA CGCATTTATC AAATGCTCAA 600 ATAAATACGCCAAATGCACA TTGTCAACAT GCCAAAATAG GCATTAACAG ACTTTTTTAG 660 ATAATACCATCAACCCATCA GAGGATTATT TTATGAAACA CATTCCTTTA ACCACACTGT 720 GTGTGGCAATCTCTGCCGTC TTATTAACCG CTTGTGGTGG TAGCAGTGGT GGTTTCAATC 780 CACCTGCCTCTACGCCCATC CCAAATGCAG GTAATTCAGG TAATGCTGGC AATGCTGGCA 840 ATGCTGGCGGTACTGGCGGT GCAAACTCTG GTGCAGGTAA TGCTGGCGGT ACTGGCGGTG 900 CAAACTCTGGTGCAGGCAGT GCCAGCACAC CAGAACCAAA ATATAAAGAT GTGCCAACCG 960 ATGAAAATAAAAAAGCTGAA GTTTCAGGCA TTCAAGAACC TGCCATGGGT TATGGCGTGG 1020 AATTAAAGCTTCGTAACTGG ATACCACAAG AACAGGAAGA ACATGCCAAA ATCAATACAA 1080 ATGATGTTGTAAAACTTGAA GGTGACTTGA AGCATAATCC ATTTGACAAC TCTATTTGGC 1140 AAAACATCAAAAATAGCAAA GAAGTACAAA CTGTTTACAA CCAAGAGAAG CAAAACATTG 1200 AAGATCAAATCAAAAGAGAA AATAAACAAC GCCCTGACAA AAAACTTGAT GACGTGGCAC 1260 TACAAGCTTATATTGAAAAA GTTCTTGATG ACCGTCTAAC AGAACTTGCT AAACCCATTT 1320 ATGAAAAAAATATTAATTAT TCACATGATA AGCAGAATAA AGCACGCACT CGTGATTTGA 1380 AGTATGTGCGTTCTGGTTAT ATTTATCGCT CAGGTTATTC TAATATCATT CCAAAGAAAA 1440 TAGCTAAAACTGGTTTTGAT GGTGCTTTAT TTTATCAAGG TACACAAACT GCTAAACAAT 1500 TGCCTGTATCTCAAGTTAAG TATAAAGGCA CTTGGGATTT TATGACCGAT GCCAAAAAAG 1560 GACAATCATTTAGCAGTTTT GGTACATCGC AACGTCTTGC TGGTGATCGT TATAGTGCAA 1620 TGTCTTACCATGAATACCCA TCTTTATTAA CTGATGAGAA AAACAAACCA GATAATTATA 1680 ACGGTGAATATGGTCATAGC AGTGAGTTTA CGGTAGATTT TAGTAAAAAG AGCCTAAAAG 1740 GTGAGCTGTCTAGTAACATA CAAGACGGCC ATAAGGGCAG TGTTAATAAA ACCAAACGCT 1800 ATGACATCGATGCCAATATC TACGGCAACC GCTTCCGTGG CAGTGCCACC GCAAGCGATA 1860 CAACAGAAGCAAGCAAAAGC AAACACCCCT TTACCAGCGA TGCCAAAAAT AGCCTAGAAG 1920 GCGGTTTTTATGGACCAAAC GCCGAGGAGC TGGCAGGTAA ATTCCTAACC AATGACAACA 1980 AACTCTTTGGCGTCTTTGGT GCTAAACGAG AGAGTGAAGC TAAGGAAAAA ACCGAAGCCA 2040 TCTTAGATGCCTATGCACTT GGGACATTTA ATAAACCTGG TACGACCAAT CCCGCCTTTA 2100 CCGCTAACAGCAAAAAAGAA CTGGATAACT TTGGCAATGC CAAAAAGTTG GTCTTGGGTT 2160 CTACCGTCATTGATTTGGTG CCTACCGGTG CCACCAAAGA TGTCAATGAA TTCAAAGAAA 2220 AGCCAAAGTCTGCCACAAAC AAAGCGGGCG AGACTTTGAT GGTGAATGAT GAAGTTATCG 2280 TCAAAACCTATGGCTATGGC AGAAACTTTG AATACCTAAA ATTTGGTGAG CTTAGTATCG 2340 GTGGTAGCCATAGCGTCTTT TTACAAGGCG AACGCACCGC TGAGAAAGCC GTACCAACCG 2400 AAGGCACAGCCAAATATCTG GGGAACTGGG TAGGATACAT CACAGGAAAG GACACAGGAA 2460 CGAGCACAGGAAAAAGCTTT AATGAGGCCC AAGATATTGC TGATTTTGAC ATTGACTTTG 2520 AGAGAAAATCAGTTAAAGGC AAACTGACCA CCCAAGGCCG CCAAGACCCT GTATTTAACA 2580 TCACAGGTCAAATCGCAGGT AATGGCTGGA CAGGCACAGC CAGCACCGCC AAAGCGAACG 2640 TAGGGGGCTACAAGATAGAT TCTAGCAGTA CAGGCAAATC CATCGTCATC GAAAATGCCA 2700 AGGTTACAGGTGGCTTTTAT GGTCCAAATG CAAACGAGAT GGGCGGGTCA TTTACACACG 2760 ATACCGATGACAGTAAAGCC TCTGTGGTCT TTGGCACAAA AAGACAAGAA GAAGTTAAGT 2820 AGTAATTTAAACACAATGCT TGGTTCGGCT GATGGGATTG ACGCTTAATC AAACATGAAT 2880 GATTAAGATGATAAACCCAA GCCATGCCAA TGATTGATAG CAACGATGGC AGATGATGAG 2940 TTTTCATTATCTGCCATTAT TATTGCTTAA TTATTGCTTG TCATTTGGTG GTGTTATCAC 3000 ATTAATCATTAAAATTAACA TAATAAATGA TTAAATGATA TTTAATGAAA GTCAGGGTTA 3060 TTTTGGTCATGGTTTTTCAT GATTATTTAA CTTATAATGC GTTATGGTTA GCAAAAAGCT 3120 AAGTCTGTCAATGAAGCTAT GGTGAGTGAT TGTGCAAAAG ATGGTCAAAA AAATCGGTAT 3180 GGTGCTGTCAGGCGTGGTGA TGGTTCTGTT AATGATAATA ACAACGCCAA GCCATGCTAC 3240 TGCCAAGTTGTTGCCGACCT CTCAAGAAAA TCCAACCAAA ACTATGGTAG ATAGCTTTGG 3300 TCGTGAAACGCCACGAGGGG CAGTTCAGGG GCTATTGCGT GCAATTGCAG CAGAAGACTA 3360 TGAGCTGGCTGCCAACTATT TGGACGGCCG TTATTTGGCA AAAACCCAAA CGCCCAATCG 3420 TGAGATTGTTGAGCA 3435 2127 base pairs nucleic acid single linear 8 ATGAAACACATTCCTTTAAC CACACTGTGT GTGGCAATCT CTGCCGTCTT ATTAACCGCT 60 TGTGGTGGTAGCAGTGGTGG TTTCAATCCA CCTGCCTCTA CGCCCATCCC AAATGCAGGT 120 AATTCAGGTAATGCTGGCAA TGCTGGCAAT GCTGGCGGTA CTGGCGGTGC AAACTCTGGT 180 GCAGGTAATGCTGGCGGTAC TGGCGGTGCA AACTCTGGTG CAGGCAGTGC CAGCACACCA 240 GAACCAAAATATAAAGATGT GCCAACCGAT GAAAATAAAA AAGCTGAAGT TTCAGGCATT 300 CAAGAACCTGCCATGGGTTA TGGCGTGGAA TTAAAGCTTC GTAACTGGAT ACCACAAGAA 360 CAGGAAGAACATGCCAAAAT CAATACAAAT GATGTTGTAA AACTTGAAGG TGACTTGAAG 420 CATAATCCATTTGACAACTC TATTTGGCAA AACATCAAAA ATAGCAAAGA AGTACAAACT 480 GTTTACAACCAAGAGAAGCA AAACATTGAA GATCAAATCA AAAGAGAAAA TAAACAACGC 540 CCTGACAAAAAACTTGATGA CGTGGCACTA CAAGCTTATA TTGAAAAAGT TCTTGATGAC 600 CGTCTAACAGAACTTGCTAA ACCCATTTAT GAAAAAAATA TTAATTATTC ACATGATAAG 660 CAGAATAAAGCACGCACTCG TGATTTGAAG TATGTGCGTT CTGGTTATAT TTATCGCTCA 720 GGTTATTCTAATATCATTCC AAAGAAAATA GCTAAAACTG GTTTTGATGG TGCTTTATTT 780 TATCAAGGTACACAAACTGC TAAACAATTG CCTGTATCTC AAGTTAAGTA TAAAGGCACT 840 TGGGATTTTATGACCGATGC CAAAAAAGGA CAATCATTTA GCAGTTTTGG TACATCGCAA 900 CGTCTTGCTGGTGATCGTTA TAGTGCAATG TCTTACCATG AATACCCATC TTTATTAACT 960 GATGAGAAAAACAAACCAGA TAATTATAAC GGTGAATATG GTCATAGCAG TGAGTTTACG 1020 GTAGATTTTAGTAAAAAGAG CCTAAAAGGT GAGCTGTCTA GTAACATACA AGACGGCCAT 1080 AAGGGCAGTGTTAATAAAAC CAAACGCTAT GACATCGATG CCAATATCTA CGGCAACCGC 1140 TTCCGTGGCAGTGCCACCGC AAGCGATACA ACAGAAGCAA GCAAAAGCAA ACACCCCTTT 1200 ACCAGCGATGCCAAAAATAG CCTAGAAGGC GGTTTTTATG GACCAAACGC CGAGGAGCTG 1260 GCAGGTAAATTCCTAACCAA TGACAACAAA CTCTTTGGCG TCTTTGGTGC TAAACGAGAG 1320 AGTGAAGCTAAGGAAAAAAC CGAAGCCATC TTAGATGCCT ATGCACTTGG GACATTTAAT 1380 AAACCTGGTACGACCAATCC CGCCTTTACC GCTAACAGCA AAAAAGAACT GGATAACTTT 1440 GGCAATGCCAAAAAGTTGGT CTTGGGTTCT ACCGTCATTG ATTTGGTGCC TACCGGTGCC 1500 ACCAAAGATGTCAATGAATT CAAAGAAAAG CCAAAGTCTG CCACAAACAA AGCGGGCGAG 1560 ACTTTGATGGTGAATGATGA AGTTATCGTC AAAACCTATG GCTATGGCAG AAACTTTGAA 1620 TACCTAAAATTTGGTGAGCT TAGTATCGGT GGTAGCCATA GCGTCTTTTT ACAAGGCGAA 1680 CGCACCGCTGAGAAAGCCGT ACCAACCGAA GGCACAGCCA AATATCTGGG GAACTGGGTA 1740 GGATACATCACAGGAAAGGA CACAGGAACG AGCACAGGAA AAAGCTTTAA TGAGGCCCAA 1800 GATATTGCTGATTTTGACAT TGACTTTGAG AGAAAATCAG TTAAAGGCAA ACTGACCACC 1860 CAAGGCCGCCAAGACCCTGT ATTTAACATC ACAGGTCAAA TCGCAGGTAA TGGCTGGACA 1920 GGCACAGCCAGCACCGCCAA AGCGAACGTA GGGGGCTACA AGATAGATTC TAGCAGTACA 1980 GGCAAATCCATCGTCATCGA AAATGCCAAG GTTACAGGTG GCTTTTATGG TCCAAATGCA 2040 AACGAGATGGGCGGGTCATT TACACACGAT ACCGATGACA GTAAAGCCTC TGTGGTCTTT 2100 GGCACAAAAAGACAAGAAGA AGTTAAG 2127 1074 amino acids amino acid single linear 9 MetAsn Gln Ser Lys Gln Asn Asn Lys Ser Lys Lys Ser Lys Gln Val 1 5 10 15Leu Lys Leu Ser Ala Leu Ser Leu Gly Leu Leu Asn Ile Thr Gln Val 20 25 30Ala Leu Ala Asn Thr Thr Ala Asp Lys Ala Glu Ala Thr Asp Lys Thr 35 40 45Asn Leu Val Val Val Leu Asp Glu Thr Val Val Thr Ala Lys Lys Asn 50 55 60Ala Arg Lys Ala Asn Glu Val Thr Gly Leu Gly Lys Val Val Lys Thr 65 70 7580 Ala Glu Thr Ile Asn Lys Glu Gln Val Leu Asn Ile Arg Asp Leu Thr 85 9095 Arg Tyr Asp Pro Gly Ile Ala Val Val Glu Gln Gly Arg Gly Ala Ser 100105 110 Ser Gly Tyr Ser Ile Arg Gly Met Asp Lys Asn Arg Val Ala Val Leu115 120 125 Val Asp Gly Ile Asn Gln Ala Gln His Tyr Ala Leu Gln Gly ProVal 130 135 140 Ala Gly Lys Asn Tyr Ala Ala Gly Gly Ala Ile Asn Glu IleGlu Tyr 145 150 155 160 Glu Asn Val Arg Ser Val Glu Ile Ser Lys Gly AlaAsn Ser Ser Glu 165 170 175 Tyr Gly Ser Gly Ala Leu Ser Gly Ser Val AlaPhe Val Thr Lys Thr 180 185 190 Ala Asp Asp Ile Ile Lys Asp Gly Lys AspTrp Gly Val Gln Thr Lys 195 200 205 Thr Ala Tyr Ala Ser Lys Asn Asn AlaTrp Val Asn Ser Val Ala Ala 210 215 220 Ala Gly Lys Ala Gly Ser Phe SerGly Leu Ile Ile Tyr Thr Asp Arg 225 230 235 240 Arg Gly Gln Glu Tyr LysAla His Asp Asp Ala Tyr Gln Gly Ser Gln 245 250 255 Ser Phe Asp Arg AlaVal Ala Thr Thr Asp Pro Asn Asn Arg Thr Phe 260 265 270 Leu Ile Ala AsnGlu Cys Ala Asn Gly Asn Tyr Glu Ala Cys Ala Ala 275 280 285 Gly Gly GlnThr Lys Leu Gln Ala Lys Pro Thr Asn Val Arg Asp Lys 290 295 300 Val AsnVal Lys Asp Tyr Thr Gly Pro Asn Arg Leu Ile Pro Asn Pro 305 310 315 320Leu Thr Gln Asp Ser Lys Ser Leu Leu Leu Arg Pro Gly Tyr Gln Leu 325 330335 Asn Asp Lys His Tyr Val Gly Gly Val Tyr Glu Ile Thr Lys Gln Asn 340345 350 Tyr Ala Met Gln Asp Lys Thr Val Pro Ala Tyr Leu Thr Val His Asp355 360 365 Ile Glu Lys Ser Arg Leu Ser Asn His Ala Gln Ala Asn Gly TyrTyr 370 375 380 Gln Gly Asn Asn Leu Gly Glu Arg Ile Arg Asp Thr Ile GlyPro Asp 385 390 395 400 Ser Gly Tyr Gly Ile Asn Tyr Ala His Gly Val PheTyr Asp Glu Lys 405 410 415 His Gln Lys Asp Arg Leu Gly Leu Glu Tyr ValTyr Asp Ser Lys Gly 420 425 430 Glu Asn Lys Trp Phe Asp Asp Val Arg ValSer Tyr Asp Lys Gln Asp 435 440 445 Ile Thr Leu Arg Ser Gln Leu Thr AsnThr His Cys Ser Thr Tyr Pro 450 455 460 His Ile Asp Lys Asn Cys Thr ProAsp Val Asn Lys Pro Phe Ser Val 465 470 475 480 Lys Glu Val Asp Asn AsnAla Tyr Lys Glu Gln His Asn Leu Ile Lys 485 490 495 Ala Val Phe Asn LysLys Met Ala Leu Gly Ser Thr His His His Ile 500 505 510 Asn Leu Gln ValGly Tyr Asp Lys Phe Asn Ser Ser Leu Ser Arg Glu 515 520 525 Asp Tyr ArgLeu Ala Thr His Gln Ser Tyr Glu Lys Leu Asp Tyr Thr 530 535 540 Pro ProSer Asn Pro Leu Pro Asp Lys Phe Lys Pro Ile Leu Gly Ser 545 550 555 560Asn Asn Lys Pro Ile Cys Leu Asp Ala Tyr Gly Tyr Gly His Asp His 565 570575 Pro Gln Ala Cys Asn Ala Lys Asn Ser Thr Tyr Gln Asn Phe Ala Ile 580585 590 Lys Lys Gly Ile Glu Gln Tyr Asn Gln Lys Thr Asn Thr Asp Lys Ile595 600 605 Asp Tyr Gln Ala Ile Ile Asp Gln Tyr Asp Lys Gln Asn Pro AsnSer 610 615 620 Thr Leu Lys Pro Phe Glu Lys Ile Lys Gln Ser Leu Gly GlnGlu Lys 625 630 635 640 Tyr Asn Lys Ile Asp Glu Leu Gly Phe Lys Ala TyrLys Asp Leu Arg 645 650 655 Asn Glu Trp Ala Gly Trp Thr Asn Asp Asn SerGln Gln Asn Ala Asn 660 665 670 Lys Gly Thr Asp Asn Ile Tyr Gln Pro AsnGln Ala Thr Val Val Lys 675 680 685 Asp Asp Lys Cys Lys Tyr Ser Glu ThrAsn Ser Tyr Ala Asp Cys Ser 690 695 700 Thr Thr Ala His Ile Ser Gly AspAsn Tyr Phe Ile Ala Leu Lys Asp 705 710 715 720 Asn Met Thr Ile Asn LysTyr Val Asp Leu Gly Leu Gly Ala Arg Tyr 725 730 735 Asp Arg Ile Lys HisLys Ser Asp Val Pro Leu Val Asp Asn Ser Ala 740 745 750 Ser Asn Gln LeuSer Trp Asn Phe Gly Val Val Val Lys Pro Thr Asn 755 760 765 Trp Leu AspIle Ala Tyr Arg Ser Ser Gln Gly Phe Arg Met Pro Ser 770 775 780 Phe SerGlu Met Tyr Gly Glu Arg Phe Gly Val Thr Ile Gly Lys Gly 785 790 795 800Thr Gln His Gly Cys Lys Gly Leu Tyr Tyr Ile Cys Gln Gln Thr Val 805 810815 His Gln Thr Lys Leu Lys Pro Glu Lys Ser Phe Asn Gln Glu Ile Gly 820825 830 Ala Thr Leu His Asn His Leu Gly Ser Leu Glu Val Ser Tyr Phe Lys835 840 845 Asn Arg Tyr Thr Asp Leu Ile Val Gly Lys Ser Glu Glu Ile ArgThr 850 855 860 Leu Thr Gln Gly Asp Asn Ala Gly Lys Gln Arg Gly Lys GlyAsp Leu 865 870 875 880 Gly Phe His Asn Gly Gln Asp Ala Asp Leu Thr GlyIle Asn Ile Leu 885 890 895 Gly Arg Leu Asp Leu Asn Ala Val Asn Ser ArgLeu Pro Tyr Gly Leu 900 905 910 Tyr Ser Thr Leu Ala Tyr Asn Lys Val AspVal Lys Gly Lys Thr Leu 915 920 925 Asn Pro Thr Leu Ala Gly Thr Asn IleLeu Phe Asp Ala Ile Gln Pro 930 935 940 Ser Arg Tyr Val Val Gly Leu GlyTyr Asp Ala Pro Ser Gln Lys Trp 945 950 955 960 Gly Ala Asn Ala Ile PheThr His Ser Asp Ala Lys Asn Pro Ser Glu 965 970 975 Leu Leu Ala Asp LysAsn Leu Gly Asn Gly Asn Ile Gln Thr Lys Gln 980 985 990 Ala Thr Lys AlaLys Ser Thr Pro Trp Gln Thr Leu Asp Leu Ser Gly 995 1000 1005 Tyr ValAsn Ile Lys Asp Asn Phe Thr Leu Arg Ala Gly Val Tyr Asn 1010 1015 1020Val Phe Asn Thr Tyr Tyr Thr Thr Trp Glu Ala Leu Arg Gln Thr Ala 10251030 1035 1040 Glu Gly Ala Val Asn Gln His Thr Gly Leu Ser Gln Asp LysHis Tyr 1045 1050 1055 Gly Arg Tyr Ala Ala Pro Gly Arg Asn Tyr Gln LeuAla Leu Glu Met 1060 1065 1070 Lys Phe 1053 amino acids amino acidsingle linear 10 Leu Ser Leu Gly Leu Leu Asn Ile Thr Gln Val Ala Leu AlaAsn Thr 1 5 10 15 Thr Ala Asp Lys Ala Glu Ala Thr Asp Lys Thr Asn LeuVal Val Val 20 25 30 Leu Asp Glu Thr Val Val Thr Ala Lys Lys Asn Ala ArgLys Ala Asn 35 40 45 Glu Val Thr Gly Leu Gly Lys Val Val Lys Thr Ala GluThr Ile Asn 50 55 60 Lys Glu Gln Val Leu Asn Ile Arg Asp Leu Thr Arg TyrAsp Pro Gly 65 70 75 80 Ile Ala Val Val Glu Gln Gly Arg Gly Ala Ser SerGly Tyr Ser Ile 85 90 95 Arg Gly Met Asp Lys Asn Arg Val Ala Val Leu ValAsp Gly Ile Asn 100 105 110 Gln Ala Gln His Tyr Ala Leu Gln Gly Pro ValAla Gly Lys Asn Tyr 115 120 125 Ala Ala Gly Gly Ala Ile Asn Glu Ile GluTyr Glu Asn Val Arg Ser 130 135 140 Val Glu Ile Ser Lys Gly Ala Asn SerSer Glu Tyr Gly Ser Gly Ala 145 150 155 160 Leu Ser Gly Ser Val Ala PheVal Thr Lys Thr Ala Asp Asp Ile Ile 165 170 175 Lys Asp Gly Lys Asp TrpGly Val Gln Thr Lys Thr Ala Tyr Ala Ser 180 185 190 Lys Asn Asn Ala TrpVal Asn Ser Val Ala Ala Ala Gly Lys Ala Gly 195 200 205 Ser Phe Ser GlyLeu Ile Ile Tyr Thr Asp Arg Arg Gly Gln Glu Tyr 210 215 220 Lys Ala HisAsp Asp Ala Tyr Gln Gly Ser Gln Ser Phe Asp Arg Ala 225 230 235 240 ValAla Thr Thr Asp Pro Asn Asn Arg Thr Phe Leu Ile Ala Asn Glu 245 250 255Cys Ala Asn Gly Asn Tyr Glu Ala Cys Ala Ala Gly Gly Gln Thr Lys 260 265270 Leu Gln Ala Lys Pro Thr Asn Val Arg Asp Lys Val Asn Val Lys Asp 275280 285 Tyr Thr Gly Pro Asn Arg Leu Ile Pro Asn Pro Leu Thr Gln Asp Ser290 295 300 Lys Ser Leu Leu Leu Arg Pro Gly Tyr Gln Leu Asn Asp Lys HisTyr 305 310 315 320 Val Gly Gly Val Tyr Glu Ile Thr Lys Gln Asn Tyr AlaMet Gln Asp 325 330 335 Lys Thr Val Pro Ala Tyr Leu Thr Val His Asp IleGlu Lys Ser Arg 340 345 350 Leu Ser Asn His Ala Gln Ala Asn Gly Tyr TyrGln Gly Asn Asn Leu 355 360 365 Gly Glu Arg Ile Arg Asp Thr Ile Gly ProAsp Ser Gly Tyr Gly Ile 370 375 380 Asn Tyr Ala His Gly Val Phe Tyr AspGlu Lys His Gln Lys Asp Arg 385 390 395 400 Leu Gly Leu Glu Tyr Val TyrAsp Ser Lys Gly Glu Asn Lys Trp Phe 405 410 415 Asp Asp Val Arg Val SerTyr Asp Lys Gln Asp Ile Thr Leu Arg Ser 420 425 430 Gln Leu Thr Asn ThrHis Cys Ser Thr Tyr Pro His Ile Asp Lys Asn 435 440 445 Cys Thr Pro AspVal Asn Lys Pro Phe Ser Val Lys Glu Val Asp Asn 450 455 460 Asn Ala TyrLys Glu Gln His Asn Leu Ile Lys Ala Val Phe Asn Lys 465 470 475 480 LysMet Ala Leu Gly Ser Thr His His His Ile Asn Leu Gln Val Gly 485 490 495Tyr Asp Lys Phe Asn Ser Ser Leu Ser Arg Glu Asp Tyr Arg Leu Ala 500 505510 Thr His Gln Ser Tyr Glu Lys Leu Asp Tyr Thr Pro Pro Ser Asn Pro 515520 525 Leu Pro Asp Lys Phe Lys Pro Ile Leu Gly Ser Asn Asn Lys Pro Ile530 535 540 Cys Leu Asp Ala Tyr Gly Tyr Gly His Asp His Pro Gln Ala CysAsn 545 550 555 560 Ala Lys Asn Ser Thr Tyr Gln Asn Phe Ala Ile Lys LysGly Ile Glu 565 570 575 Gln Tyr Asn Gln Lys Thr Asn Thr Asp Lys Ile AspTyr Gln Ala Ile 580 585 590 Ile Asp Gln Tyr Asp Lys Gln Asn Pro Asn SerThr Leu Lys Pro Phe 595 600 605 Glu Lys Ile Lys Gln Ser Leu Gly Gln GluLys Tyr Asn Lys Ile Asp 610 615 620 Glu Leu Gly Phe Lys Ala Tyr Lys AspLeu Arg Asn Glu Trp Ala Gly 625 630 635 640 Trp Thr Asn Asp Asn Ser GlnGln Asn Ala Asn Lys Gly Thr Asp Asn 645 650 655 Ile Tyr Gln Pro Asn GlnAla Thr Val Val Lys Asp Asp Lys Cys Lys 660 665 670 Tyr Ser Glu Thr AsnSer Tyr Ala Asp Cys Ser Thr Thr Ala His Ile 675 680 685 Ser Gly Asp AsnTyr Phe Ile Ala Leu Lys Asp Asn Met Thr Ile Asn 690 695 700 Lys Tyr ValAsp Leu Gly Leu Gly Ala Arg Tyr Asp Arg Ile Lys His 705 710 715 720 LysSer Asp Val Pro Leu Val Asp Asn Ser Ala Ser Asn Gln Leu Ser 725 730 735Trp Asn Phe Gly Val Val Val Lys Pro Thr Asn Trp Leu Asp Ile Ala 740 745750 Tyr Arg Ser Ser Gln Gly Phe Arg Met Pro Ser Phe Ser Glu Met Tyr 755760 765 Gly Glu Arg Phe Gly Val Thr Ile Gly Lys Gly Thr Gln His Gly Cys770 775 780 Lys Gly Leu Tyr Tyr Ile Cys Gln Gln Thr Val His Gln Thr LysLeu 785 790 795 800 Lys Pro Glu Lys Ser Phe Asn Gln Glu Ile Gly Ala ThrLeu His Asn 805 810 815 His Leu Gly Ser Leu Glu Val Ser Tyr Phe Lys AsnArg Tyr Thr Asp 820 825 830 Leu Ile Val Gly Lys Ser Glu Glu Ile Arg ThrLeu Thr Gln Gly Asp 835 840 845 Asn Ala Gly Lys Gln Arg Gly Lys Gly AspLeu Gly Phe His Asn Gly 850 855 860 Gln Asp Ala Asp Leu Thr Gly Ile AsnIle Leu Gly Arg Leu Asp Leu 865 870 875 880 Asn Ala Val Asn Ser Arg LeuPro Tyr Gly Leu Tyr Ser Thr Leu Ala 885 890 895 Tyr Asn Lys Val Asp ValLys Gly Lys Thr Leu Asn Pro Thr Leu Ala 900 905 910 Gly Thr Asn Ile LeuPhe Asp Ala Ile Gln Pro Ser Arg Tyr Val Val 915 920 925 Gly Leu Gly TyrAsp Ala Pro Ser Gln Lys Trp Gly Ala Asn Ala Ile 930 935 940 Phe Thr HisSer Asp Ala Lys Asn Pro Ser Glu Leu Leu Ala Asp Lys 945 950 955 960 AsnLeu Gly Asn Gly Asn Ile Gln Thr Lys Gln Ala Thr Lys Ala Lys 965 970 975Ser Thr Pro Trp Gln Thr Leu Asp Leu Ser Gly Tyr Val Asn Ile Lys 980 985990 Asp Asn Phe Thr Leu Arg Ala Gly Val Tyr Asn Val Phe Asn Thr Tyr 9951000 1005 Tyr Thr Thr Trp Glu Ala Leu Arg Gln Thr Ala Glu Gly Ala ValAsn 1010 1015 1020 Gln His Thr Gly Leu Ser Gln Asp Lys His Tyr Gly ArgTyr Ala Ala 1025 1030 1035 1040 Pro Gly Arg Asn Tyr Gln Leu Ala Leu GluMet Lys Phe 1045 1050 702 amino acids amino acid single linear 11 MetLys His Ile Pro Leu Thr Thr Leu Cys Val Ala Ile Ser Ala Val 1 5 10 15Leu Leu Thr Ala Cys Gly Gly Ser Gly Gly Ser Asn Pro Pro Ala Pro 20 25 30Thr Pro Ile Pro Asn Ala Ser Gly Ser Gly Asn Thr Gly Asn Thr Gly 35 40 45Asn Ala Gly Gly Thr Asp Asn Thr Ala Asn Ala Gly Asn Thr Gly Gly 50 55 60Thr Asn Ser Gly Thr Gly Ser Ala Asn Thr Pro Glu Pro Lys Tyr Gln 65 70 7580 Asp Val Pro Thr Glu Lys Asn Glu Lys Asp Lys Val Ser Ser Ile Gln 85 9095 Glu Pro Ala Met Gly Tyr Gly Met Ala Leu Ser Lys Ile Asn Leu His 100105 110 Asn Arg Gln Asp Thr Pro Leu Asp Glu Lys Asn Ile Ile Thr Leu Asp115 120 125 Gly Lys Lys Gln Val Ala Glu Gly Lys Lys Ser Pro Leu Pro PheSer 130 135 140 Leu Asp Val Glu Asn Lys Leu Leu Asp Gly Tyr Ile Ala LysMet Asn 145 150 155 160 Val Ala Asp Lys Asn Ala Ile Gly Asp Arg Ile LysLys Gly Asn Lys 165 170 175 Glu Ile Ser Asp Glu Glu Leu Ala Lys Gln IleLys Glu Ala Val Arg 180 185 190 Lys Ser His Glu Phe Gln Gln Val Leu SerSer Leu Glu Asn Lys Ile 195 200 205 Phe His Ser Asn Asp Gly Thr Thr LysAla Thr Thr Arg Asp Leu Lys 210 215 220 Tyr Val Asp Tyr Gly Tyr Tyr LeuAla Asn Asp Gly Asn Tyr Leu Thr 225 230 235 240 Val Lys Thr Asp Lys LeuTrp Asn Leu Gly Pro Val Gly Gly Val Phe 245 250 255 Tyr Asn Gly Thr ThrThr Ala Lys Glu Leu Pro Thr Gln Asp Ala Val 260 265 270 Lys Tyr Lys GlyHis Trp Asp Phe Met Thr Asp Val Ala Asn Arg Arg 275 280 285 Asn Arg PheSer Glu Val Lys Glu Asn Ser Gln Ala Gly Trp Tyr Tyr 290 295 300 Gly AlaSer Ser Lys Asp Glu Tyr Asn Arg Leu Leu Thr Lys Glu Asp 305 310 315 320Ser Ala Pro Asp Gly His Ser Gly Glu Tyr Gly His Ser Ser Glu Phe 325 330335 Thr Val Asn Phe Lys Glu Lys Lys Leu Thr Gly Lys Leu Phe Ser Asn 340345 350 Leu Gln Asp Arg His Lys Gly Asn Val Thr Lys Thr Glu Arg Tyr Asp355 360 365 Ile Asp Ala Asn Ile His Gly Asn Arg Phe Arg Gly Ser Ala ThrAla 370 375 380 Ser Asn Lys Asn Asp Thr Ser Lys His Pro Phe Thr Ser AspAla Asn 385 390 395 400 Asn Arg Leu Glu Gly Gly Phe Tyr Gly Pro Lys GlyGlu Glu Leu Ala 405 410 415 Gly Lys Phe Leu Thr Asn Asp Asn Lys Leu PheGly Val Phe Gly Ala 420 425 430 Lys Arg Glu Ser Lys Ala Glu Glu Lys ThrGlu Ala Ile Leu Asp Ala 435 440 445 Tyr Ala Leu Gly Thr Phe Asn Thr SerAsn Ala Thr Thr Phe Thr Pro 450 455 460 Phe Thr Glu Lys Gln Leu Asp AsnPhe Gly Asn Ala Lys Lys Leu Val 465 470 475 480 Leu Gly Ser Thr Val IleAsp Leu Val Pro Thr Asp Ala Thr Lys Asn 485 490 495 Glu Phe Thr Lys AspLys Pro Glu Ser Ala Thr Asn Glu Ala Gly Glu 500 505 510 Thr Leu Met ValAsn Asp Glu Val Ser Val Lys Thr Tyr Gly Lys Asn 515 520 525 Phe Glu TyrLeu Lys Phe Gly Glu Leu Ser Ile Gly Gly Ser His Ser 530 535 540 Val PheLeu Gln Gly Glu Arg Thr Ala Thr Thr Gly Glu Lys Ala Val 545 550 555 560Pro Thr Thr Gly Thr Ala Lys Tyr Leu Gly Asn Trp Val Gly Tyr Ile 565 570575 Thr Gly Lys Asp Thr Gly Thr Gly Thr Gly Lys Ser Phe Thr Asp Ala 580585 590 Gln Asp Val Ala Asp Phe Asp Ile Asp Phe Gly Asn Lys Ser Val Ser595 600 605 Gly Lys Leu Ile Thr Lys Gly Arg Gln Asp Pro Val Phe Ser IleThr 610 615 620 Gly Gln Ile Ala Gly Asn Gly Trp Thr Gly Thr Ala Ser ThrThr Lys 625 630 635 640 Ala Asp Ala Gly Gly Tyr Lys Ile Asp Ser Ser SerThr Gly Lys Ser 645 650 655 Ile Val Ile Lys Asp Ala Asn Val Thr Gly GlyPhe Tyr Gly Pro Asn 660 665 670 Ala Asn Glu Met Gly Gly Ser Phe Thr HisAsn Ala Asp Asp Ser Lys 675 680 685 Ala Ser Val Val Phe Gly Thr Lys ArgGln Gln Glu Val Lys 690 695 700 682 amino acids amino acid single linear12 Cys Gly Gly Ser Gly Gly Ser Asn Pro Pro Ala Pro Thr Pro Ile Pro 1 510 15 Asn Ala Ser Gly Ser Gly Asn Thr Gly Asn Thr Gly Asn Ala Gly Gly 2025 30 Thr Asp Asn Thr Ala Asn Ala Gly Asn Thr Gly Gly Thr Asn Ser Gly 3540 45 Thr Gly Ser Ala Asn Thr Pro Glu Pro Lys Tyr Gln Asp Val Pro Thr 5055 60 Glu Lys Asn Glu Lys Asp Lys Val Ser Ser Ile Gln Glu Pro Ala Met 6570 75 80 Gly Tyr Gly Met Ala Leu Ser Lys Ile Asn Leu His Asn Arg Gln Asp85 90 95 Thr Pro Leu Asp Glu Lys Asn Ile Ile Thr Leu Asp Gly Lys Lys Gln100 105 110 Val Ala Glu Gly Lys Lys Ser Pro Leu Pro Phe Ser Leu Asp ValGlu 115 120 125 Asn Lys Leu Leu Asp Gly Tyr Ile Ala Lys Met Asn Val AlaAsp Lys 130 135 140 Asn Ala Ile Gly Asp Arg Ile Lys Lys Gly Asn Lys GluIle Ser Asp 145 150 155 160 Glu Glu Leu Ala Lys Gln Ile Lys Glu Ala ValArg Lys Ser His Glu 165 170 175 Phe Gln Gln Val Leu Ser Ser Leu Glu AsnLys Ile Phe His Ser Asn 180 185 190 Asp Gly Thr Thr Lys Ala Thr Thr ArgAsp Leu Lys Tyr Val Asp Tyr 195 200 205 Gly Tyr Tyr Leu Ala Asn Asp GlyAsn Tyr Leu Thr Val Lys Thr Asp 210 215 220 Lys Leu Trp Asn Leu Gly ProVal Gly Gly Val Phe Tyr Asn Gly Thr 225 230 235 240 Thr Thr Ala Lys GluLeu Pro Thr Gln Asp Ala Val Lys Tyr Lys Gly 245 250 255 His Trp Asp PheMet Thr Asp Val Ala Asn Arg Arg Asn Arg Phe Ser 260 265 270 Glu Val LysGlu Asn Ser Gln Ala Gly Trp Tyr Tyr Gly Ala Ser Ser 275 280 285 Lys AspGlu Tyr Asn Arg Leu Leu Thr Lys Glu Asp Ser Ala Pro Asp 290 295 300 GlyHis Ser Gly Glu Tyr Gly His Ser Ser Glu Phe Thr Val Asn Phe 305 310 315320 Lys Glu Lys Lys Leu Thr Gly Lys Leu Phe Ser Asn Leu Gln Asp Arg 325330 335 His Lys Gly Asn Val Thr Lys Thr Glu Arg Tyr Asp Ile Asp Ala Asn340 345 350 Ile His Gly Asn Arg Phe Arg Gly Ser Ala Thr Ala Ser Asn LysAsn 355 360 365 Asp Thr Ser Lys His Pro Phe Thr Ser Asp Ala Asn Asn ArgLeu Glu 370 375 380 Gly Gly Phe Tyr Gly Pro Lys Gly Glu Glu Leu Ala GlyLys Phe Leu 385 390 395 400 Thr Asn Asp Asn Lys Leu Phe Gly Val Phe GlyAla Lys Arg Glu Ser 405 410 415 Lys Ala Glu Glu Lys Thr Glu Ala Ile LeuAsp Ala Tyr Ala Leu Gly 420 425 430 Thr Phe Asn Thr Ser Asn Ala Thr ThrPhe Thr Pro Phe Thr Glu Lys 435 440 445 Gln Leu Asp Asn Phe Gly Asn AlaLys Lys Leu Val Leu Gly Ser Thr 450 455 460 Val Ile Asp Leu Val Pro ThrAsp Ala Thr Lys Asn Glu Phe Thr Lys 465 470 475 480 Asp Lys Pro Glu SerAla Thr Asn Glu Ala Gly Glu Thr Leu Met Val 485 490 495 Asn Asp Glu ValSer Val Lys Thr Tyr Gly Lys Asn Phe Glu Tyr Leu 500 505 510 Lys Phe GlyGlu Leu Ser Ile Gly Gly Ser His Ser Val Phe Leu Gln 515 520 525 Gly GluArg Thr Ala Thr Thr Gly Glu Lys Ala Val Pro Thr Thr Gly 530 535 540 ThrAla Lys Tyr Leu Gly Asn Trp Val Gly Tyr Ile Thr Gly Lys Asp 545 550 555560 Thr Gly Thr Gly Thr Gly Lys Ser Phe Thr Asp Ala Gln Asp Val Ala 565570 575 Asp Phe Asp Ile Asp Phe Gly Asn Lys Ser Val Ser Gly Lys Leu Ile580 585 590 Thr Lys Gly Arg Gln Asp Pro Val Phe Ser Ile Thr Gly Gln IleAla 595 600 605 Gly Asn Gly Trp Thr Gly Thr Ala Ser Thr Thr Lys Ala AspAla Gly 610 615 620 Gly Tyr Lys Ile Asp Ser Ser Ser Thr Gly Lys Ser IleVal Ile Lys 625 630 635 640 Asp Ala Asn Val Thr Gly Gly Phe Tyr Gly ProAsn Ala Asn Glu Met 645 650 655 Gly Gly Ser Phe Thr His Asn Ala Asp AspSer Lys Ala Ser Val Val 660 665 670 Phe Gly Thr Lys Arg Gln Gln Glu ValLys 675 680 1070 amino acids amino acid single linear 13 Met Asn Gln SerLys Lys Ser Lys Lys Ser Lys Gln Val Leu Lys Leu 1 5 10 15 Ser Ala LeuSer Leu Gly Leu Leu Asn Ile Thr Gln Val Ala Leu Ala 20 25 30 Asn Thr ThrAla Asp Lys Ala Glu Ala Thr Asp Lys Thr Asn Leu Val 35 40 45 Val Val LeuAsp Glu Thr Val Val Thr Ala Lys Lys Asn Ala Arg Lys 50 55 60 Ala Asn GluVal Thr Gly Leu Gly Lys Val Val Lys Thr Ala Glu Thr 65 70 75 80 Ile AsnLys Glu Gln Val Leu Asn Ile Arg Asp Leu Thr Arg Tyr Asp 85 90 95 Pro GlyIle Ala Val Val Glu Gln Gly Arg Gly Ala Ser Ser Gly Tyr 100 105 110 SerIle Arg Gly Met Asp Lys Asn Arg Val Ala Val Leu Val Asp Gly 115 120 125Ile Asn Gln Ala Gln His Tyr Ala Leu Gln Gly Pro Val Ala Gly Lys 130 135140 Asn Tyr Ala Ala Gly Gly Ala Ile Asn Glu Ile Glu Tyr Glu Asn Val 145150 155 160 Arg Ser Val Glu Ile Ser Lys Gly Ala Asn Ser Ser Glu Tyr GlySer 165 170 175 Gly Ala Leu Ser Gly Ser Val Ala Phe Val Thr Lys Thr AlaAsp Asp 180 185 190 Ile Ile Lys Asp Gly Lys Asp Trp Gly Val Gln Thr LysThr Ala Tyr 195 200 205 Ala Ser Lys Asn Asn Ala Trp Val Asn Ser Val AlaAla Ala Gly Lys 210 215 220 Ala Gly Ser Phe Ser Gly Leu Ile Ile Tyr ThrAsp Arg Arg Gly Gln 225 230 235 240 Glu Tyr Lys Ala His Asp Asp Ala TyrGln Gly Ser Gln Ser Phe Asp 245 250 255 Arg Ala Val Ala Thr Thr Asp ProAsn Asn Pro Lys Phe Leu Ile Ala 260 265 270 Asn Glu Cys Ala Asn Gly AsnTyr Glu Ala Cys Ala Ala Gly Gly Gln 275 280 285 Thr Lys Leu Gln Ala LysPro Thr Asn Val Arg Asp Lys Val Asn Val 290 295 300 Lys Asp Tyr Thr GlyPro Asn Arg Leu Ile Pro Asn Pro Leu Thr Gln 305 310 315 320 Asp Ser LysSer Leu Leu Leu Arg Pro Gly Tyr Gln Leu Asn Asp Lys 325 330 335 His TyrVal Gly Gly Val Tyr Glu Ile Thr Lys Gln Asn Tyr Ala Met 340 345 350 GlnAsp Lys Thr Val Pro Ala Tyr Leu Thr Val His Asp Ile Glu Lys 355 360 365Ser Arg Leu Ser Asn His Gly Gln Ala Asn Gly Tyr Tyr Gln Gly Asn 370 375380 Asn Leu Gly Glu Arg Ile Arg Asp Ala Ile Gly Ala Asn Ser Gly Tyr 385390 395 400 Gly Ile Asn Tyr Ala His Gly Val Phe Tyr Asp Glu Lys His GlnLys 405 410 415 Asp Arg Leu Gly Leu Glu Tyr Val Tyr Asp Ser Lys Gly GluAsn Lys 420 425 430 Trp Phe Asp Asp Val Arg Val Ser Tyr Asp Lys Gln AspIle Thr Leu 435 440 445 Arg Ser Gln Leu Thr Asn Thr His Cys Ser Thr TyrPro His Ile Asp 450 455 460 Lys Asn Cys Thr Pro Asp Val Asn Lys Pro PheSer Val Lys Glu Val 465 470 475 480 Asp Asn Asn Ala Tyr Lys Glu Gln HisAsn Leu Ile Lys Ala Val Phe 485 490 495 Asn Lys Lys Met Ala Leu Gly AsnThr His His His Ile Asn Leu Gln 500 505 510 Val Gly Tyr Asp Lys Phe AsnSer Ser Leu Ser Arg Glu Asp Tyr Arg 515 520 525 Leu Ala Thr His Gln SerTyr Gln Lys Leu Asp Tyr Thr Pro Pro Ser 530 535 540 Asn Pro Leu Pro AspLys Phe Lys Pro Ile Leu Gly Ser Asn Asn Arg 545 550 555 560 Pro Ile CysLeu Asp Ala Tyr Gly Tyr Gly His Asp His Pro Gln Ala 565 570 575 Cys AsnAla Lys Asn Ser Thr Tyr Gln Asn Phe Ala Ile Lys Lys Gly 580 585 590 IleGlu Gln Tyr Asn Gln Thr Asn Thr Asp Lys Ile Asp Tyr Gln Ala 595 600 605Val Ile Asp Gln Tyr Asp Lys Gln Asn Pro Asn Ser Thr Leu Lys Pro 610 615620 Phe Glu Lys Ile Lys Gln Ser Leu Gly Gln Glu Lys Tyr Asp Glu Ile 625630 635 640 Asp Arg Leu Gly Phe Asn Ala Tyr Lys Asp Leu Arg Asn Glu TrpAla 645 650 655 Gly Trp Thr Asn Asp Asn Ser Gln Gln Asn Ala Asn Lys GlyThr Asp 660 665 670 Asn Ile Tyr Gln Pro Asn Gln Ala Thr Val Val Lys AspAsp Lys Cys 675 680 685 Lys Tyr Ser Glu Thr Asn Ser Tyr Ala Asp Cys SerThr Thr Arg His 690 695 700 Ile Ser Gly Asp Asn Tyr Phe Ile Ala Leu LysAsp Asn Met Thr Ile 705 710 715 720 Asn Lys Tyr Val Asp Leu Gly Leu GlyAla Arg Tyr Asp Arg Ile Lys 725 730 735 His Lys Ser Asp Val Pro Leu ValAsp Asn Ser Ala Ser Asn Gln Leu 740 745 750 Ser Trp Asn Phe Gly Val ValVal Lys Pro Thr Asn Trp Leu Asp Ile 755 760 765 Ala Tyr Arg Ser Ser GlnGly Phe Arg Met Pro Ser Phe Ser Glu Met 770 775 780 Tyr Gly Glu Arg PheGly Val Thr Ile Gly Lys Gly Thr Gln His Gly 785 790 795 800 Cys Lys GlyLeu Tyr Tyr Ile Cys Gln Gln Thr Val His Gln Thr Lys 805 810 815 Leu LysPro Glu Lys Ser Phe Asn Gln Glu Ile Gly Ala Thr Leu His 820 825 830 AsnHis Leu Gly Ser Leu Glu Val Ser Tyr Phe Lys Asn Arg Tyr Thr 835 840 845Asp Leu Ile Val Gly Lys Ser Glu Glu Ile Arg Thr Leu Thr Gln Gly 850 855860 Asp Asn Ala Gly Lys Gln Arg Gly Lys Gly Asp Leu Gly Phe His Asn 865870 875 880 Gly Gln Asp Ala Asp Leu Thr Gly Ile Asn Ile Leu Gly Arg LeuAsp 885 890 895 Leu Asn Ala Val Asn Ser Arg Leu Pro Tyr Gly Leu Tyr SerThr Leu 900 905 910 Ala Tyr Asn Lys Val Asp Val Lys Gly Lys Thr Leu AsnPro Thr Leu 915 920 925 Ala Gly Thr Asn Ile Leu Phe Asp Ala Ile Gln ProSer Arg Tyr Val 930 935 940 Val Gly Leu Gly Tyr Asp Ala Pro Ser Gln LysTrp Gly Ala Asn Ala 945 950 955 960 Ile Phe Thr His Ser Asp Ala Lys AsnPro Ser Glu Leu Leu Ala Asp 965 970 975 Lys Asn Leu Gly Asn Gly Asn AsnGln Thr Lys Gln Ala Thr Lys Ala 980 985 990 Lys Ser Thr Pro Trp Gln ThrLeu Asp Leu Ser Gly Tyr Val Asn Ile 995 1000 1005 Lys Asp Asn Phe ThrLeu Arg Ala Gly Val Tyr Asn Val Phe Asn Thr 1010 1015 1020 Tyr Tyr ThrThr Trp Glu Ala Leu Arg Gln Thr Ala Glu Gly Ala Val 1025 1030 1035 1040Asn Gln His Thr Gly Leu Ser Gln Asp Lys His Tyr Gly Arg Tyr Ala 10451050 1055 Ala Pro Gly Arg Asn Tyr Gln Leu Ala Leu Glu Met Lys Phe 10601065 1070 1052 amino acids amino acid single linear 14 Leu Ser Leu GlyLeu Leu Asn Ile Thr Gln Val Ala Leu Ala Asn Thr 1 5 10 15 Thr Ala AspLys Ala Glu Ala Thr Asp Lys Thr Asn Leu Val Val Val 20 25 30 Leu Asp GluThr Val Val Thr Ala Lys Lys Asn Ala Arg Lys Ala Asn 35 40 45 Glu Val ThrGly Leu Gly Lys Val Val Lys Thr Ala Glu Thr Ile Asn 50 55 60 Lys Glu GlnVal Leu Asn Ile Arg Asp Leu Thr Arg Tyr Asp Pro Gly 65 70 75 80 Ile AlaVal Val Glu Gln Gly Arg Gly Ala Ser Ser Gly Tyr Ser Ile 85 90 95 Arg GlyMet Asp Lys Asn Arg Val Ala Val Leu Val Asp Gly Ile Asn 100 105 110 GlnAla Gln His Tyr Ala Leu Gln Gly Pro Val Ala Gly Lys Asn Tyr 115 120 125Ala Ala Gly Gly Ala Ile Asn Glu Ile Glu Tyr Glu Asn Val Arg Ser 130 135140 Val Glu Ile Ser Lys Gly Ala Asn Ser Ser Glu Tyr Gly Ser Gly Ala 145150 155 160 Leu Ser Gly Ser Val Ala Phe Val Thr Lys Thr Ala Asp Asp IleIle 165 170 175 Lys Asp Gly Lys Asp Trp Gly Val Gln Thr Lys Thr Ala TyrAla Ser 180 185 190 Lys Asn Asn Ala Trp Val Asn Ser Val Ala Ala Ala GlyLys Ala Gly 195 200 205 Ser Phe Ser Gly Leu Ile Ile Tyr Thr Asp Arg ArgGly Gln Glu Tyr 210 215 220 Lys Ala His Asp Asp Ala Tyr Gln Gly Ser GlnSer Phe Asp Arg Ala 225 230 235 240 Val Ala Thr Thr Asp Pro Asn Asn ProLys Phe Leu Ile Ala Asn Glu 245 250 255 Cys Ala Asn Gly Asn Tyr Glu AlaCys Ala Ala Gly Gly Gln Thr Lys 260 265 270 Leu Gln Ala Lys Pro Thr AsnVal Arg Asp Lys Val Asn Val Lys Asp 275 280 285 Tyr Thr Gly Pro Asn ArgLeu Ile Pro Asn Pro Leu Thr Gln Asp Ser 290 295 300 Lys Ser Leu Leu LeuArg Pro Gly Tyr Gln Leu Asn Asp Lys His Tyr 305 310 315 320 Val Gly GlyVal Tyr Glu Ile Thr Lys Gln Asn Tyr Ala Met Gln Asp 325 330 335 Lys ThrVal Pro Ala Tyr Leu Thr Val His Asp Ile Glu Lys Ser Arg 340 345 350 LeuSer Asn His Gly Gln Ala Asn Gly Tyr Tyr Gln Gly Asn Asn Leu 355 360 365Gly Glu Arg Ile Arg Asp Ala Ile Gly Ala Asn Ser Gly Tyr Gly Ile 370 375380 Asn Tyr Ala His Gly Val Phe Tyr Asp Glu Lys His Gln Lys Asp Arg 385390 395 400 Leu Gly Leu Glu Tyr Val Tyr Asp Ser Lys Gly Glu Asn Lys TrpPhe 405 410 415 Asp Asp Val Arg Val Ser Tyr Asp Lys Gln Asp Ile Thr LeuArg Ser 420 425 430 Gln Leu Thr Asn Thr His Cys Ser Thr Tyr Pro His IleAsp Lys Asn 435 440 445 Cys Thr Pro Asp Val Asn Lys Pro Phe Ser Val LysGlu Val Asp Asn 450 455 460 Asn Ala Tyr Lys Glu Gln His Asn Leu Ile LysAla Val Phe Asn Lys 465 470 475 480 Lys Met Ala Leu Gly Asn Thr His HisHis Ile Asn Leu Gln Val Gly 485 490 495 Tyr Asp Lys Phe Asn Ser Ser LeuSer Arg Glu Asp Tyr Arg Leu Ala 500 505 510 Thr His Gln Ser Tyr Gln LysLeu Asp Tyr Thr Pro Pro Ser Asn Pro 515 520 525 Leu Pro Asp Lys Phe LysPro Ile Leu Gly Ser Asn Asn Arg Pro Ile 530 535 540 Cys Leu Asp Ala TyrGly Tyr Gly His Asp His Pro Gln Ala Cys Asn 545 550 555 560 Ala Lys AsnSer Thr Tyr Gln Asn Phe Ala Ile Lys Lys Gly Ile Glu 565 570 575 Gln TyrAsn Gln Thr Asn Thr Asp Lys Ile Asp Tyr Gln Ala Val Ile 580 585 590 AspGln Tyr Asp Lys Gln Asn Pro Asn Ser Thr Leu Lys Pro Phe Glu 595 600 605Lys Ile Lys Gln Ser Leu Gly Gln Glu Lys Tyr Asp Glu Ile Asp Arg 610 615620 Leu Gly Phe Asn Ala Tyr Lys Asp Leu Arg Asn Glu Trp Ala Gly Trp 625630 635 640 Thr Asn Asp Asn Ser Gln Gln Asn Ala Asn Lys Gly Thr Asp AsnIle 645 650 655 Tyr Gln Pro Asn Gln Ala Thr Val Val Lys Asp Asp Lys CysLys Tyr 660 665 670 Ser Glu Thr Asn Ser Tyr Ala Asp Cys Ser Thr Thr ArgHis Ile Ser 675 680 685 Gly Asp Asn Tyr Phe Ile Ala Leu Lys Asp Asn MetThr Ile Asn Lys 690 695 700 Tyr Val Asp Leu Gly Leu Gly Ala Arg Tyr AspArg Ile Lys His Lys 705 710 715 720 Ser Asp Val Pro Leu Val Asp Asn SerAla Ser Asn Gln Leu Ser Trp 725 730 735 Asn Phe Gly Val Val Val Lys ProThr Asn Trp Leu Asp Ile Ala Tyr 740 745 750 Arg Ser Ser Gln Gly Phe ArgMet Pro Ser Phe Ser Glu Met Tyr Gly 755 760 765 Glu Arg Phe Gly Val ThrIle Gly Lys Gly Thr Gln His Gly Cys Lys 770 775 780 Gly Leu Tyr Tyr IleCys Gln Gln Thr Val His Gln Thr Lys Leu Lys 785 790 795 800 Pro Glu LysSer Phe Asn Gln Glu Ile Gly Ala Thr Leu His Asn His 805 810 815 Leu GlySer Leu Glu Val Ser Tyr Phe Lys Asn Arg Tyr Thr Asp Leu 820 825 830 IleVal Gly Lys Ser Glu Glu Ile Arg Thr Leu Thr Gln Gly Asp Asn 835 840 845Ala Gly Lys Gln Arg Gly Lys Gly Asp Leu Gly Phe His Asn Gly Gln 850 855860 Asp Ala Asp Leu Thr Gly Ile Asn Ile Leu Gly Arg Leu Asp Leu Asn 865870 875 880 Ala Val Asn Ser Arg Leu Pro Tyr Gly Leu Tyr Ser Thr Leu AlaTyr 885 890 895 Asn Lys Val Asp Val Lys Gly Lys Thr Leu Asn Pro Thr LeuAla Gly 900 905 910 Thr Asn Ile Leu Phe Asp Ala Ile Gln Pro Ser Arg TyrVal Val Gly 915 920 925 Leu Gly Tyr Asp Ala Pro Ser Gln Lys Trp Gly AlaAsn Ala Ile Phe 930 935 940 Thr His Ser Asp Ala Lys Asn Pro Ser Glu LeuLeu Ala Asp Lys Asn 945 950 955 960 Leu Gly Asn Gly Asn Asn Gln Thr LysGln Ala Thr Lys Ala Lys Ser 965 970 975 Thr Pro Trp Gln Thr Leu Asp LeuSer Gly Tyr Val Asn Ile Lys Asp 980 985 990 Asn Phe Thr Leu Arg Ala GlyVal Tyr Asn Val Phe Asn Thr Tyr Tyr 995 1000 1005 Thr Thr Trp Glu AlaLeu Arg Gln Thr Ala Glu Gly Ala Val Asn Gln 1010 1015 1020 His Thr GlyLeu Ser Gln Asp Lys His Tyr Gly Arg Tyr Ala Ala Pro 1025 1030 1035 1040Gly Arg Asn Tyr Gln Leu Ala Leu Glu Met Lys Phe 1045 1050 709 aminoacids amino acid single linear 15 Met Lys His Ile Pro Leu Thr Thr LeuCys Val Ala Ile Ser Ala Val 1 5 10 15 Leu Leu Thr Ala Cys Gly Gly SerSer Gly Gly Phe Asn Pro Pro Ala 20 25 30 Ser Thr Pro Ile Pro Asn Ala GlyAsn Ser Gly Asn Ala Gly Asn Ala 35 40 45 Gly Asn Ala Gly Gly Thr Gly GlyAla Asn Ser Gly Ala Gly Asn Ala 50 55 60 Gly Gly Thr Gly Gly Ala Asn SerGly Ala Gly Ser Ala Ser Thr Pro 65 70 75 80 Glu Pro Lys Tyr Lys Asp ValPro Thr Asp Glu Asn Lys Lys Ala Glu 85 90 95 Val Ser Gly Ile Gln Glu ProAla Met Gly Tyr Gly Val Glu Leu Lys 100 105 110 Leu Arg Asn Trp Ile ProGln Glu Gln Glu Glu His Ala Lys Ile Asn 115 120 125 Thr Asn Asp Val ValLys Leu Glu Gly Asp Leu Lys His Asn Pro Phe 130 135 140 Asp Asn Ser IleTrp Gln Asn Ile Lys Asn Ser Lys Glu Val Gln Thr 145 150 155 160 Val TyrAsn Gln Glu Lys Gln Asn Ile Glu Asp Gln Ile Lys Arg Glu 165 170 175 AsnLys Gln Arg Pro Asp Lys Lys Leu Asp Asp Val Ala Leu Gln Ala 180 185 190Tyr Ile Glu Lys Val Leu Asp Asp Arg Leu Thr Glu Leu Ala Lys Pro 195 200205 Ile Tyr Glu Lys Asn Ile Asn Tyr Ser His Asp Lys Gln Asn Lys Ala 210215 220 Arg Thr Arg Asp Leu Lys Tyr Val Arg Ser Gly Tyr Ile Tyr Arg Ser225 230 235 240 Gly Tyr Ser Asn Ile Ile Pro Lys Lys Ile Ala Lys Thr GlyPhe Asp 245 250 255 Gly Ala Leu Phe Tyr Gln Gly Thr Gln Thr Ala Lys GlnLeu Pro Val 260 265 270 Ser Gln Val Lys Tyr Lys Gly Thr Trp Asp Phe MetThr Asp Ala Lys 275 280 285 Lys Gly Gln Ser Phe Ser Ser Phe Gly Thr SerGln Arg Leu Ala Gly 290 295 300 Asp Arg Tyr Ser Ala Met Ser Tyr His GluTyr Pro Ser Leu Leu Thr 305 310 315 320 Asp Glu Lys Asn Lys Pro Asp AsnTyr Asn Gly Glu Tyr Gly His Ser 325 330 335 Ser Glu Phe Thr Val Asp PheSer Lys Lys Ser Leu Lys Gly Glu Leu 340 345 350 Ser Ser Asn Ile Gln AspGly His Lys Gly Ser Val Asn Lys Thr Lys 355 360 365 Arg Tyr Asp Ile AspAla Asn Ile Tyr Gly Asn Arg Phe Arg Gly Ser 370 375 380 Ala Thr Ala SerAsp Thr Thr Glu Ala Ser Lys Ser Lys His Pro Phe 385 390 395 400 Thr SerAsp Ala Lys Asn Ser Leu Glu Gly Gly Phe Tyr Gly Pro Asn 405 410 415 AlaGlu Glu Leu Ala Gly Lys Phe Leu Thr Asn Asp Asn Lys Leu Phe 420 425 430Gly Val Phe Gly Ala Lys Arg Glu Ser Glu Ala Lys Glu Lys Thr Glu 435 440445 Ala Ile Leu Asp Ala Tyr Ala Leu Gly Thr Phe Asn Lys Pro Gly Thr 450455 460 Thr Asn Pro Ala Phe Thr Ala Asn Ser Lys Lys Glu Leu Asp Asn Phe465 470 475 480 Gly Asn Ala Lys Lys Leu Val Leu Gly Ser Thr Val Ile AspLeu Val 485 490 495 Pro Thr Gly Ala Thr Lys Asp Val Asn Glu Phe Lys GluLys Pro Lys 500 505 510 Ser Ala Thr Asn Lys Ala Gly Glu Thr Leu Met ValAsn Asp Glu Val 515 520 525 Ile Val Lys Thr Tyr Gly Tyr Gly Arg Asn PheGlu Tyr Leu Lys Phe 530 535 540 Gly Glu Leu Ser Ile Gly Gly Ser His SerVal Phe Leu Gln Gly Glu 545 550 555 560 Arg Thr Ala Glu Lys Ala Val ProThr Glu Gly Thr Ala Lys Tyr Leu 565 570 575 Gly Asn Trp Val Gly Tyr IleThr Gly Lys Asp Thr Gly Thr Ser Thr 580 585 590 Gly Lys Ser Phe Asn GluAla Gln Asp Ile Ala Asp Phe Asp Ile Asp 595 600 605 Phe Glu Arg Lys SerVal Lys Gly Lys Leu Thr Thr Gln Gly Arg Gln 610 615 620 Asp Pro Val PheAsn Ile Thr Gly Gln Ile Ala Gly Asn Gly Trp Thr 625 630 635 640 Gly ThrAla Ser Thr Ala Lys Ala Asn Val Gly Gly Tyr Lys Ile Asp 645 650 655 SerSer Ser Thr Gly Lys Ser Ile Val Ile Glu Asn Ala Lys Val Thr 660 665 670Gly Gly Phe Tyr Gly Pro Asn Ala Asn Glu Met Gly Gly Ser Phe Thr 675 680685 His Asp Thr Asp Asp Ser Lys Ala Ser Val Val Phe Gly Thr Lys Arg 690695 700 Gln Glu Glu Val Lys 705 689 amino acids amino acid single linear16 Cys Gly Gly Ser Ser Gly Gly Phe Asn Pro Pro Ala Ser Thr Pro Ile 1 510 15 Pro Asn Ala Gly Asn Ser Gly Asn Ala Gly Asn Ala Gly Asn Ala Gly 2025 30 Gly Thr Gly Gly Ala Asn Ser Gly Ala Gly Asn Ala Gly Gly Thr Gly 3540 45 Gly Ala Asn Ser Gly Ala Gly Ser Ala Ser Thr Pro Glu Pro Lys Tyr 5055 60 Lys Asp Val Pro Thr Asp Glu Asn Lys Lys Ala Glu Val Ser Gly Ile 6570 75 80 Gln Glu Pro Ala Met Gly Tyr Gly Val Glu Leu Lys Leu Arg Asn Trp85 90 95 Ile Pro Gln Glu Gln Glu Glu His Ala Lys Ile Asn Thr Asn Asp Val100 105 110 Val Lys Leu Glu Gly Asp Leu Lys His Asn Pro Phe Asp Asn SerIle 115 120 125 Trp Gln Asn Ile Lys Asn Ser Lys Glu Val Gln Thr Val TyrAsn Gln 130 135 140 Glu Lys Gln Asn Ile Glu Asp Gln Ile Lys Arg Glu AsnLys Gln Arg 145 150 155 160 Pro Asp Lys Lys Leu Asp Asp Val Ala Leu GlnAla Tyr Ile Glu Lys 165 170 175 Val Leu Asp Asp Arg Leu Thr Glu Leu AlaLys Pro Ile Tyr Glu Lys 180 185 190 Asn Ile Asn Tyr Ser His Asp Lys GlnAsn Lys Ala Arg Thr Arg Asp 195 200 205 Leu Lys Tyr Val Arg Ser Gly TyrIle Tyr Arg Ser Gly Tyr Ser Asn 210 215 220 Ile Ile Pro Lys Lys Ile AlaLys Thr Gly Phe Asp Gly Ala Leu Phe 225 230 235 240 Tyr Gln Gly Thr GlnThr Ala Lys Gln Leu Pro Val Ser Gln Val Lys 245 250 255 Tyr Lys Gly ThrTrp Asp Phe Met Thr Asp Ala Lys Lys Gly Gln Ser 260 265 270 Phe Ser SerPhe Gly Thr Ser Gln Arg Leu Ala Gly Asp Arg Tyr Ser 275 280 285 Ala MetSer Tyr His Glu Tyr Pro Ser Leu Leu Thr Asp Glu Lys Asn 290 295 300 LysPro Asp Asn Tyr Asn Gly Glu Tyr Gly His Ser Ser Glu Phe Thr 305 310 315320 Val Asp Phe Ser Lys Lys Ser Leu Lys Gly Glu Leu Ser Ser Asn Ile 325330 335 Gln Asp Gly His Lys Gly Ser Val Asn Lys Thr Lys Arg Tyr Asp Ile340 345 350 Asp Ala Asn Ile Tyr Gly Asn Arg Phe Arg Gly Ser Ala Thr AlaSer 355 360 365 Asp Thr Thr Glu Ala Ser Lys Ser Lys His Pro Phe Thr SerAsp Ala 370 375 380 Lys Asn Ser Leu Glu Gly Gly Phe Tyr Gly Pro Asn AlaGlu Glu Leu 385 390 395 400 Ala Gly Lys Phe Leu Thr Asn Asp Asn Lys LeuPhe Gly Val Phe Gly 405 410 415 Ala Lys Arg Glu Ser Glu Ala Lys Glu LysThr Glu Ala Ile Leu Asp 420 425 430 Ala Tyr Ala Leu Gly Thr Phe Asn LysPro Gly Thr Thr Asn Pro Ala 435 440 445 Phe Thr Ala Asn Ser Lys Lys GluLeu Asp Asn Phe Gly Asn Ala Lys 450 455 460 Lys Leu Val Leu Gly Ser ThrVal Ile Asp Leu Val Pro Thr Gly Ala 465 470 475 480 Thr Lys Asp Val AsnGlu Phe Lys Glu Lys Pro Lys Ser Ala Thr Asn 485 490 495 Lys Ala Gly GluThr Leu Met Val Asn Asp Glu Val Ile Val Lys Thr 500 505 510 Tyr Gly TyrGly Arg Asn Phe Glu Tyr Leu Lys Phe Gly Glu Leu Ser 515 520 525 Ile GlyGly Ser His Ser Val Phe Leu Gln Gly Glu Arg Thr Ala Glu 530 535 540 LysAla Val Pro Thr Glu Gly Thr Ala Lys Tyr Leu Gly Asn Trp Val 545 550 555560 Gly Tyr Ile Thr Gly Lys Asp Thr Gly Thr Ser Thr Gly Lys Ser Phe 565570 575 Asn Glu Ala Gln Asp Ile Ala Asp Phe Asp Ile Asp Phe Glu Arg Lys580 585 590 Ser Val Lys Gly Lys Leu Thr Thr Gln Gly Arg Gln Asp Pro ValPhe 595 600 605 Asn Ile Thr Gly Gln Ile Ala Gly Asn Gly Trp Thr Gly ThrAla Ser 610 615 620 Thr Ala Lys Ala Asn Val Gly Gly Tyr Lys Ile Asp SerSer Ser Thr 625 630 635 640 Gly Lys Ser Ile Val Ile Glu Asn Ala Lys ValThr Gly Gly Phe Tyr 645 650 655 Gly Pro Asn Ala Asn Glu Met Gly Gly SerPhe Thr His Asp Thr Asp 660 665 670 Asp Ser Lys Ala Ser Val Val Phe GlyThr Lys Arg Gln Glu Glu Val 675 680 685 Lys 7 amino acids amino acidsingle linear 17 Asn Glu Val Thr Gly Leu Gly 1 5 7 amino acids aminoacid single linear 18 Gly Ala Ile Asn Glu Ile Glu 1 5 60 base pairsnucleic acid single linear 19 AATCAATCAA AACAAAACAA CAAATCCAAAAAATCCAAAC AAGTATTAAA ACTTAGTGCC 60 57 base pairs nucleic acid singlelinear 20 AAACACATTC CTTTAACCAC ACTGTGTGTG GCAATCTCTG CCGTCTTATT AACCGCT57 912 amino acids amino acid single linear 21 Met Thr Lys Lys Pro TyrPhe Arg Leu Ser Ile Ile Ser Cys Leu Leu 1 5 10 15 Ile Gly Cys Tyr ValLys Ala Glu Thr Gln Ser Ile Lys Asp Thr Lys 20 25 30 Glu Ala Ile Ser SerGlu Val Asp Thr Gln Ser Thr Glu Asp Ser Glu 35 40 45 Leu Glu Thr Ile SerVal Thr Ala Glu Lys Ile Arg Asp Arg Lys Asp 50 55 60 Asn Glu Val Thr GlyLeu Gly Lys Ile Ile Lys Thr Ser Glu Ser Ile 65 70 75 80 Ser Arg Glu GlnVal Leu Asn Ile Arg Asp Leu Thr Arg Tyr Asp Pro 85 90 95 Gly Ile Ser ValVal Glu Gln Gly Arg Gly Ala Ser Ser Gly Tyr Ser 100 105 110 Ile Arg GlyMet Asp Arg Asn Arg Val Ala Leu Leu Val Asp Gly Leu 115 120 125 Pro GlnThr Gln Ser Tyr Val Val Gln Ser Pro Leu Val Ala Arg Ser 130 135 140 GlyTyr Ser Gly Thr Gly Ala Ile Asn Glu Ile Glu Tyr Glu Asn Val 145 150 155160 Lys Ala Val Glu Ile Ser Lys Gly Gly Ser Ser Ser Glu Tyr Gly Asn 165170 175 Gly Ala Leu Ala Gly Ser Val Thr Phe Gln Ser Lys Ser Ala Ala Asp180 185 190 Ile Leu Glu Gly Asp Lys Ser Trp Gly Ile Gln Thr Lys Asn AlaTyr 195 200 205 Ser Ser Lys Asn Lys Gly Phe Thr His Ser Leu Ala Val AlaGly Lys 210 215 220 Gln Gly Gly Phe Glu Gly Leu Ala Ile Tyr Thr Gln ArgAsn Ser Ile 225 230 235 240 Glu Thr Gln Val His Lys Asp Ala Leu Lys GlyVal Gln Ser Tyr Asp 245 250 255 Arg Leu Ile Ala Thr Thr Asp Lys Ser SerGly Tyr Phe Val Ile Gln 260 265 270 Gly Glu Cys Pro Asn Gly Asp Asp LysCys Ala Ala Lys Pro Pro Ala 275 280 285 Thr Leu Ser Thr Gln Ser Glu ThrVal Ser Val Ser Asp Tyr Thr Gly 290 295 300 Ala Asn Arg Ile Lys Pro AsnPro Met Lys Tyr Glu Ser Gln Ser Trp 305 310 315 320 Phe Leu Arg Gly GlyTyr His Phe Ser Glu Gln His Tyr Ile Gly Gly 325 330 335 Ile Phe Glu PheThr Gln Gln Lys Phe Asp Ile Arg Asp Met Thr Phe 340 345 350 Pro Ala TyrLeu Ser Pro Thr Glu Arg Arg Asp Asp Ser Ser Arg Ser 355 360 365 Phe TyrPro Met Gln Asp His Gly Ala Tyr Gln His Ile Glu Asp Gly 370 375 380 ArgGly Val Lys Tyr Ala Ser Gly Leu Tyr Phe Asp Glu His His Arg 385 390 395400 Lys Gln Arg Val Gly Ile Glu Tyr Ile Tyr Glu Asn Lys Asn Lys Ala 405410 415 Gly Ile Ile Asp Lys Ala Val Leu Ser Ala Asn Gln Gln Asn Ile Ile420 425 430 Leu Asp Ser Tyr Met Arg His Thr His Cys Ser Leu Tyr Pro AsnPro 435 440 445 Ser Lys Asn Cys Arg Pro Thr Leu Asp Lys Pro Tyr Ser TyrTyr Arg 450 455 460 Ser Asp Arg Asn Val Tyr Lys Glu Lys His Asn Met LeuGln Leu Asn 465 470 475 480 Leu Glu Lys Lys Ile Gln Gln Asn Trp Leu ThrHis Gln Ile Val Phe 485 490 495 Asn Leu Gly Phe Asp Asp Phe Thr Ser AlaLeu Gln His Lys Asp Tyr 500 505 510 Leu Thr Arg Arg Val Ile Ala Thr AlaAsp Ser Ile Pro Arg Lys Pro 515 520 525 Gly Glu Thr Gly Lys Pro Arg AsnGly Leu Gln Ser Gln Pro Tyr Leu 530 535 540 Tyr Pro Lys Pro Glu Pro TyrPhe Ala Gly Gln Asp His Cys Asn Tyr 545 550 555 560 Gln Gly Ser Ser SerAsn Tyr Arg Asp Cys Lys Val Arg Leu Ile Lys 565 570 575 Gly Lys Asn TyrTyr Phe Ala Ala Arg Asn Asn Met Ala Leu Gly Lys 580 585 590 Tyr Val AspLeu Gly Leu Gly Ile Arg Tyr Asp Val Ser Arg Thr Lys 595 600 605 Ala AsnGlu Ser Thr Ile Ser Val Gly Lys Phe Lys Asn Phe Ser Trp 610 615 620 AsnThr Gly Ile Val Ile Lys Pro Thr Glu Trp Leu Asp Leu Ser Tyr 625 630 635640 Arg Leu Ser Thr Gly Phe Arg Asn Pro Ser Phe Ser Glu Met Tyr Gly 645650 655 Trp Arg Tyr Gly Gly Lys Asn Asp Glu Val Tyr Val Gly Lys Phe Lys660 665 670 Pro Glu Thr Ser Arg Asn Gln Glu Phe Gly Leu Ala Leu Lys GlyAsp 675 680 685 Phe Gly Asn Ile Glu Ile Ser His Phe Ser Asn Ala Tyr ArgAsn Leu 690 695 700 Ile Ala Phe Ala Glu Glu Leu Ser Lys Asn Gly Thr GlyLys Gly Asn 705 710 715 720 Tyr Gly Tyr His Asn Ala Gln Asn Ala Lys LeuVal Gly Val Asn Ile 725 730 735 Thr Ala Gln Leu Asp Phe Asn Gly Leu TrpLys Arg Ile Pro Tyr Gly 740 745 750 Trp Tyr Ala Thr Phe Ala Tyr Asn GlnVal Lys Val Lys Asp Gln Lys 755 760 765 Ile Asn Ala Gly Leu Ala Ser ValSer Ser Tyr Leu Phe Asp Ala Ile 770 775 780 Gln Pro Ser Arg Tyr Ile IleGly Leu Gly Tyr Asp His Pro Ser Asn 785 790 795 800 Thr Trp Gly Ile AsnThr Met Phe Thr Gln Ser Lys Ala Lys Ser Gln 805 810 815 Asn Glu Leu LeuGly Lys Arg Ala Leu Gly Asn Asn Ser Arg Asp Val 820 825 830 Lys Ser ThrArg Lys Leu Thr Arg Ala Trp His Ile Leu Asp Val Ser 835 840 845 Gly TyrTyr Met Ala Asn Lys Asn Ile Met Leu Arg Leu Gly Ile Tyr 850 855 860 AsnLeu Phe Asn Tyr Arg Tyr Val Thr Trp Glu Ala Val Arg Gln Thr 865 870 875880 Ala Gln Gly Ala Val Asn Gln His Gln Asn Val Gly Ser Tyr Thr Arg 885890 895 Tyr Ala Ala Ser Gly Arg Asn Tyr Thr Leu Thr Leu Glu Met Lys Phe900 905 910 908 amino acids amino acid single linear 22 Met Gln Gln GlnHis Leu Phe Arg Leu Asn Ile Leu Cys Leu Ser Leu 1 5 10 15 Met Thr AlaLeu Pro Val Tyr Ala Glu Asn Val Gln Ala Glu Gln Ala 20 25 30 Gln Glu LysGln Leu Asp Thr Ile Gln Val Lys Ala Lys Lys Gln Lys 35 40 45 Thr Arg ArgAsp Asn Glu Val Thr Gly Leu Gly Lys Leu Val Lys Ser 50 55 60 Ser Asp ThrLeu Ser Lys Glu Gln Val Leu Asn Ile Arg Asp Leu Thr 65 70 75 80 Arg TyrAsp Pro Gly Ile Ala Val Val Glu Gln Gly Arg Gly Ala Ser 85 90 95 Ser GlyTyr Ser Ile Arg Gly Met Asp Lys Asn Arg Val Ser Leu Thr 100 105 110 ValAsp Gly Val Ser Gln Ile Gln Ser Tyr Thr Ala Gln Ala Ala Leu 115 120 125Gly Gly Thr Arg Thr Ala Gly Ser Ser Gly Ala Ile Asn Glu Ile Glu 130 135140 Tyr Glu Asn Val Lys Ala Val Glu Ile Ser Lys Gly Ser Asn Ser Ser 145150 155 160 Glu Tyr Gly Asn Gly Ala Leu Ala Gly Ser Val Ala Phe Gln ThrLys 165 170 175 Thr Ala Ala Asp Ile Ile Gly Glu Gly Lys Gln Trp Gly IleGln Ser 180 185 190 Lys Thr Ala Tyr Ser Gly Lys Asp His Ala Leu Thr GlnSer Leu Ala 195 200 205 Leu Ala Gly Arg Ser Gly Gly Ala Glu Ala Leu LeuIle Tyr Thr Lys 210 215 220 Arg Arg Gly Arg Glu Ile His Ala His Lys AspAla Gly Lys Gly Val 225 230 235 240 Gln Ser Phe Asn Arg Leu Val Leu AspGlu Asp Lys Lys Glu Gly Gly 245 250 255 Ser Gln Tyr Arg Tyr Phe Ile ValGlu Glu Glu Cys His Asn Gly Tyr 260 265 270 Ala Ala Cys Lys Asn Lys LeuLys Glu Asp Ala Ser Val Lys Asp Glu 275 280 285 Arg Lys Thr Val Ser ThrGln Asp Tyr Thr Gly Ser Asn Arg Leu Leu 290 295 300 Ala Asn Pro Leu GluTyr Gly Ser Gln Ser Trp Leu Phe Arg Pro Gly 305 310 315 320 Trp His LeuAsp Asn Arg His Tyr Val Gly Ala Val Leu Glu Arg Thr 325 330 335 Gln GlnThr Phe Asp Thr Arg Asp Met Thr Val Pro Ala Tyr Phe Thr 340 345 350 SerGlu Asp Tyr Val Pro Gly Ser Leu Lys Gly Leu Gly Lys Tyr Ser 355 360 365Gly Asp Asn Lys Ala Glu Arg Leu Phe Val Gln Gly Glu Gly Ser Thr 370 375380 Leu Gln Gly Ile Gly Tyr Gly Thr Gly Val Phe Tyr Asp Glu Arg His 385390 395 400 Thr Lys Asn Arg Tyr Gly Val Glu Tyr Val Tyr His Asn Ala AspLys 405 410 415 Asp Thr Trp Ala Asp Tyr Ala Arg Leu Ser Tyr Asp Arg GlnGly Ile 420 425 430 Asp Leu Asp Asn Arg Leu Gln Gln Thr His Cys Ser HisAsp Gly Ser 435 440 445 Asp Lys Asn Cys Arg Pro Asp Gly Asn Lys Pro TyrSer Phe Tyr Lys 450 455 460 Ser Asp Arg Met Ile Tyr Glu Glu Ser Arg AsnLeu Phe Gln Ala Val 465 470 475 480 Phe Lys Lys Ala Phe Asp Thr Ala LysIle Arg His Asn Leu Ser Ile 485 490 495 Asn Leu Gly Tyr Asp Arg Phe LysSer Gln Leu Ser His Ser Asp Tyr 500 505 510 Tyr Leu Gln Asn Ala Val GlnAla Tyr Asp Leu Ile Thr Pro Pro Lys 515 520 525 Pro Pro Phe Pro Asn GlySer Lys Asp Asn Pro Tyr Arg Val Ser Ile 530 535 540 Gly Lys Thr Thr ValAsn Thr Ser Pro Ile Cys Arg Phe Gly Asn Asn 545 550 555 560 Thr Tyr ThrAsp Cys Thr Pro Arg Asn Ile Gly Gly Asn Gly Tyr Tyr 565 570 575 Ala AlaVal Gln Asp Asn Val Arg Leu Gly Arg Trp Ala Asp Val Gly 580 585 590 AlaGly Ile Arg Tyr Asp Tyr Arg Ser Thr His Ser Glu Asp Lys Ser 595 600 605Val Ser Thr Gly Thr His Arg Asn Leu Ser Trp Asn Ala Gly Val Val 610 615620 Leu Lys Pro Phe Thr Trp Met Asp Leu Thr Tyr Arg Ala Ser Thr Gly 625630 635 640 Phe Arg Leu Pro Ser Phe Ala Glu Met Tyr Gly Trp Arg Ala GlyGlu 645 650 655 Ser Leu Lys Thr Leu Asp Leu Lys Pro Glu Lys Ser Phe AsnArg Glu 660 665 670 Ala Gly Ile Val Phe Lys Gly Asp Phe Gly Asn Leu GluAla Ser Tyr 675 680 685 Phe Asn Asn Ala Tyr Arg Asp Leu Ile Ala Phe GlyTyr Glu Thr Arg 690 695 700 Thr Gln Asn Gly Gln Thr Ser Ala Ser Gly AspPro Gly Tyr Arg Asn 705 710 715 720 Ala Gln Asn Ala Arg Ile Ala Gly IleAsn Ile Leu Gly Lys Ile Asp 725 730 735 Trp His Gly Val Trp Gly Gly LeuPro Asp Gly Leu Tyr Ser Thr Leu 740 745 750 Ala Tyr Asn Arg Ile Lys ValLys Asp Ala Asp Ile Arg Ala Asp Arg 755 760 765 Thr Phe Val Thr Ser TyrLeu Phe Asp Ala Val Gln Pro Ser Arg Tyr 770 775 780 Val Leu Gly Leu GlyTyr Asp His Pro Asp Gly Ile Trp Gly Ile Asn 785 790 795 800 Thr Met PheThr Tyr Ser Lys Ala Lys Ser Val Asp Glu Leu Leu Gly 805 810 815 Ser GlnAla Leu Leu Asn Gly Asn Ala Asn Ala Lys Lys Ala Ala Ser 820 825 830 ArgArg Thr Arg Pro Trp Tyr Val Thr Asp Val Ser Gly Tyr Tyr Asn 835 840 845Ile Lys Lys His Leu Thr Leu Arg Ala Gly Val Tyr Asn Leu Leu Asn 850 855860 Tyr Arg Tyr Val Thr Trp Glu Asn Val Arg Gln Thr Ala Gly Gly Ala 865870 875 880 Val Asn Gln His Lys Asn Val Gly Val Tyr Asn Arg Tyr Ala AlaPro 885 890 895 Gly Arg Asn Tyr Thr Phe Ser Leu Glu Met Lys Phe 900 905911 amino acids amino acid single linear 23 Met Gln Gln Gln His Leu PheArg Leu Asn Ile Leu Cys Leu Ser Leu 1 5 10 15 Met Thr Ala Leu Pro AlaTyr Ala Glu Asn Val Gln Ala Gly Gln Ala 20 25 30 Gln Glu Lys Gln Leu AspThr Ile Gln Val Lys Ala Lys Lys Gln Lys 35 40 45 Thr Arg Arg Asp Asn GluVal Thr Gly Leu Gly Lys Leu Val Lys Thr 50 55 60 Ala Asp Thr Leu Ser LysGlu Gln Val Leu Asp Ile Arg Asp Leu Thr 65 70 75 80 Arg Tyr Asp Pro GlyIle Ala Val Val Glu Gln Gly Arg Gly Ala Ser 85 90 95 Ser Gly Tyr Ser IleArg Gly Met Asp Lys Asn Arg Val Ser Leu Thr 100 105 110 Val Asp Gly LeuAla Gln Ile Gln Ser Tyr Thr Ala Gln Ala Ala Leu 115 120 125 Gly Gly ThrArg Thr Ala Gly Ser Ser Gly Ala Ile Asn Glu Ile Glu 130 135 140 Tyr GluAsn Val Lys Ala Val Glu Ile Ser Lys Gly Ser Asn Ser Val 145 150 155 160Glu Gln Gly Ser Gly Ala Leu Ala Gly Ser Val Ala Phe Gln Thr Lys 165 170175 Thr Ala Asp Asp Val Ile Gly Glu Gly Arg Gln Trp Gly Ile Gln Ser 180185 190 Lys Thr Ala Tyr Ser Gly Lys Asn Arg Gly Leu Thr Gln Ser Ile Ala195 200 205 Leu Ala Gly Arg Ile Gly Gly Ala Glu Ala Leu Leu Ile His ThrGly 210 215 220 Arg Arg Ala Gly Glu Ile Arg Ala His Glu Asp Ala Gly ArgGly Val 225 230 235 240 Gln Ser Phe Asn Arg Leu Val Pro Val Glu Asp SerSer Glu Tyr Ala 245 250 255 Tyr Phe Ile Val Glu Asp Glu Cys Glu Gly LysAsn Tyr Glu Thr Cys 260 265 270 Lys Ser Lys Pro Lys Lys Asp Val Val GlyLys Asp Glu Arg Gln Thr 275 280 285 Val Ser Thr Arg Asp Tyr Thr Gly ProAsn Arg Phe Leu Ala Asp Pro 290 295 300 Leu Ser Tyr Glu Ser Arg Ser TrpLeu Phe Arg Pro Gly Phe Arg Phe 305 310 315 320 Glu Asn Lys Arg His TyrIle Gly Gly Ile Leu Glu His Thr Gln Gln 325 330 335 Thr Phe Asp Thr ArgAsp Met Thr Val Pro Ala Phe Leu Thr Lys Ala 340 345 350 Val Phe Asp AlaAsn Ser Lys Gln Ala Gly Ser Leu Pro Gly Asn Gly 355 360 365 Lys Tyr AlaGly Asn His Lys Tyr Gly Gly Leu Phe Thr Asn Gly Glu 370 375 380 Asn GlyAla Leu Val Gly Ala Glu Tyr Gly Thr Gly Val Phe Tyr Asp 385 390 395 400Glu Thr His Thr Lys Ser Arg Tyr Gly Leu Glu Tyr Val Tyr Thr Asn 405 410415 Ala Asp Lys Asp Thr Trp Ala Asp Tyr Ala Arg Leu Ser Tyr Asp Arg 420425 430 Gln Gly Ile Gly Leu Asp Asn His Phe Gln Gln Thr His Cys Ser Ala435 440 445 Asp Gly Ser Asp Lys Tyr Cys Arg Pro Ser Ala Asp Lys Pro PheSer 450 455 460 Tyr Tyr Lys Ser Asp Arg Val Ile Tyr Gly Glu Ser His ArgLeu Leu 465 470 475 480 Gln Ala Ala Phe Lys Lys Ser Phe Asp Thr Ala LysIle Arg His Asn 485 490 495 Leu Ser Val Asn Leu Gly Phe Asp Arg Phe AspSer Asn Leu Arg His 500 505 510 Gln Asp Tyr Tyr Tyr Gln His Ala Asn ArgAla Tyr Ser Ser Lys Thr 515 520 525 Pro Pro Lys Thr Ala Asn Pro Asn GlyAsp Lys Ser Lys Pro Tyr Trp 530 535 540 Val Ser Ile Gly Gly Gly Asn ValVal Thr Gly Gln Ile Cys Leu Phe 545 550 555 560 Gly Asn Asn Thr Tyr ThrAsp Cys Thr Pro Arg Ser Ile Asn Gly Lys 565 570 575 Ser Tyr Tyr Ala AlaVal Arg Asp Asn Val Arg Leu Gly Arg Trp Ala 580 585 590 Asp Val Gly AlaGly Leu Arg Tyr Asp Tyr Arg Ser Thr His Ser Asp 595 600 605 Asp Gly SerVal Ser Thr Gly Thr His Arg Thr Leu Ser Trp Asn Ala 610 615 620 Gly IleVal Leu Lys Pro Ala Asp Trp Leu Asp Leu Thr Tyr Arg Thr 625 630 635 640Ser Thr Gly Phe Arg Leu Pro Ser Phe Ala Glu Met Tyr Gly Trp Arg 645 650655 Ser Gly Val Gln Ser Lys Ala Val Lys Ile Asp Pro Glu Lys Ser Phe 660665 670 Asn Lys Glu Ala Gly Ile Val Phe Lys Gly Asp Phe Gly Asn Leu Glu675 680 685 Ala Ser Trp Phe Asn Asn Ala Tyr Arg Asp Leu Ile Val Arg GlyTyr 690 695 700 Glu Ala Gln Ile Lys Asn Gly Lys Glu Glu Ala Lys Gly AspPro Ala 705 710 715 720 Tyr Leu Asn Ala Gln Ser Ala Arg Ile Thr Gly IleAsn Ile Leu Gly 725 730 735 Lys Ile Asp Trp Asn Gly Val Trp Asp Lys LeuPro Glu Gly Trp Tyr 740 745 750 Ser Thr Phe Ala Tyr Asn Arg Val His ValArg Asp Ile Lys Lys Arg 755 760 765 Ala Asp Arg Thr Asp Ile Gln Ser HisLeu Phe Asp Ala Ile Gln Pro 770 775 780 Ser Arg Tyr Val Val Gly Leu GlyTyr Asp Gln Pro Glu Gly Lys Trp 785 790 795 800 Gly Val Asn Gly Met LeuThr Tyr Ser Lys Ala Lys Glu Ile Thr Glu 805 810 815 Leu Leu Gly Ser ArgAla Leu Leu Asn Gly Asn Ser Arg Asn Thr Lys 820 825 830 Ala Thr Ala ArgArg Thr Arg Pro Trp Tyr Ile Val Asp Val Ser Gly 835 840 845 Tyr Tyr ThrIle Lys Lys His Phe Thr Leu Arg Ala Gly Val Tyr Asn 850 855 860 Leu LeuAsn Tyr Arg Tyr Val Thr Trp Glu Asn Val Arg Gln Thr Ala 865 870 875 880Gly Gly Ala Val Asn Gln His Lys Asn Val Gly Val Tyr Asn Arg Tyr 885 890895 Ala Ala Pro Gly Arg Asn Tyr Thr Phe Ser Leu Glu Met Lys Phe 900 905910 915 amino acids amino acid single linear 24 Met Gln Gln Gln His LeuPhe Arg Leu Asn Ile Leu Cys Leu Ser Leu 1 5 10 15 Met Thr Ala Leu ProAla Tyr Ala Glu Asn Val Gln Ala Gly Gln Ala 20 25 30 Gln Glu Lys Gln LeuAsp Thr Ile Gln Val Lys Ala Lys Lys Gln Lys 35 40 45 Thr Arg Arg Asp AsnGlu Val Thr Gly Leu Gly Lys Leu Val Lys Thr 50 55 60 Ala Asp Thr Leu SerLys Glu Gln Val Leu Asp Ile Arg Asp Leu Thr 65 70 75 80 Arg Tyr Asp ProGly Ile Ala Val Val Glu Gln Gly Arg Gly Ala Ser 85 90 95 Ser Gly Tyr SerIle Arg Gly Met Asp Lys Asn Arg Val Ser Leu Thr 100 105 110 Val Asp GlyLeu Ala Gln Ile Gln Ser Tyr Thr Ala Gln Ala Ala Leu 115 120 125 Gly GlyThr Arg Thr Ala Gly Ser Ser Gly Ala Ile Asn Glu Ile Glu 130 135 140 TyrGlu Asn Val Lys Ala Val Glu Ile Ser Lys Gly Ser Asn Ser Val 145 150 155160 Glu Gln Gly Ser Gly Ala Leu Ala Gly Ser Val Ala Phe Gln Thr Lys 165170 175 Thr Ala Asp Asp Val Ile Gly Glu Gly Arg Gln Trp Gly Ile Gln Ser180 185 190 Lys Thr Ala Tyr Ser Gly Lys Asn Arg Gly Leu Thr Gln Ser LeuAla 195 200 205 Leu Ala Gly Arg Ile Gly Gly Ala Glu Ala Leu Leu Ile ArgThr Gly 210 215 220 Arg His Ala Gly Glu Ile Arg Ala His Glu Ala Ala GlyArg Gly Val 225 230 235 240 Gln Ser Phe Asn Arg Leu Ala Pro Val Asp AspGly Ser Lys Tyr Ala 245 250 255 Tyr Phe Ile Val Glu Glu Glu Cys Lys AsnGly Gly His Glu Lys Cys 260 265 270 Lys Ala Asn Pro Pro Lys Asp Val ValGly Glu Asp Lys Arg Gln Thr 275 280 285 Val Ser Thr Arg Asp Tyr Thr GlyPro Asn Arg Phe Leu Ala Asp Pro 290 295 300 Leu Ser Tyr Glu Ser Arg SerTrp Leu Phe Arg Pro Gly Phe Arg Phe 305 310 315 320 Glu Asn Lys Arg HisTyr Ile Gly Gly Ile Leu Glu Arg Thr Gln Gln 325 330 335 Thr Phe Asp ThrArg Asp Met Thr Val Pro Ala Phe Leu Thr Lys Ala 340 345 350 Val Phe AspAla Asn Gln Lys Gln Ala Gly Ser Leu Arg Gly Asn Gly 355 360 365 Lys TyrAla Gly Asn His Lys Tyr Gly Gly Leu Phe Thr Ser Gly Glu 370 375 380 AsnAsn Ala Pro Val Gly Ala Glu Tyr Gly Thr Gly Val Phe Tyr Asp 385 390 395400 Glu Thr His Thr Lys Ser Arg Tyr Gly Leu Glu Tyr Val Tyr Thr Asn 405410 415 Ala Asp Lys Asp Thr Trp Ala Asp Tyr Ala Arg Leu Ser Tyr Asp Arg420 425 430 Gln Gly Ile Gly Leu Asp Asn His Phe Gln Gln Thr His Cys SerAla 435 440 445 Asp Gly Ser Asp Lys Tyr Cys Arg Pro Ser Ala Asp Lys ProPhe Ser 450 455 460 Tyr Tyr Lys Ser Asp Arg Val Ile Tyr Gly Glu Ser HisLys Leu Leu 465 470 475 480 Gln Ala Ala Phe Lys Lys Ser Phe Asp Thr AlaLys Ile Arg His Asn 485 490 495 Leu Ser Val Asn Leu Gly Tyr Asp Arg PheGly Ser Asn Leu Arg His 500 505 510 Gln Asp Tyr Tyr Tyr Gln Ser Ala AsnArg Ala Tyr Ser Ser Lys Thr 515 520 525 Pro Pro Gln Asn Asn Gly Lys LysThr Ser Pro Asn Gly Arg Glu Lys 530 535 540 Asn Pro Tyr Trp Val Ser IleGly Arg Gly Asn Val Val Thr Arg Gln 545 550 555 560 Ile Cys Leu Phe GlyAsn Asn Thr Tyr Thr Asp Cys Thr Pro Arg Ser 565 570 575 Ile Asn Gly LysSer Tyr Tyr Ala Ala Val Arg Asp Asn Val Arg Leu 580 585 590 Gly Arg TrpAla Asp Val Gly Ala Gly Leu Arg Tyr Asp Tyr Arg Ser 595 600 605 Thr HisSer Asp Asp Gly Ser Val Ser Thr Gly Thr His Arg Thr Leu 610 615 620 SerTrp Asn Ala Gly Ile Val Leu Lys Pro Ala Asp Trp Leu Asp Leu 625 630 635640 Thr Tyr Arg Thr Ser Thr Gly Phe Arg Leu Pro Ser Phe Ala Glu Met 645650 655 Tyr Gly Trp Arg Ser Gly Asp Lys Ile Lys Ala Val Lys Ile Asp Pro660 665 670 Glu Lys Ser Phe Asn Lys Glu Ala Gly Ile Val Phe Lys Gly AspPhe 675 680 685 Gly Asn Leu Glu Ala Ser Trp Phe Asn Asn Ala Tyr Arg AspLeu Ile 690 695 700 Val Arg Gly Tyr Glu Ala Gln Ile Lys Asp Gly Lys GluGln Val Lys 705 710 715 720 Gly Asn Pro Ala Tyr Leu Asn Ala Gln Ser AlaArg Ile Thr Gly Ile 725 730 735 Asn Ile Leu Gly Lys Ile Asp Trp Asn GlyVal Trp Asp Lys Leu Pro 740 745 750 Glu Gly Trp Tyr Ser Thr Phe Ala TyrAsn Arg Val Arg Val Arg Asp 755 760 765 Ile Lys Lys Arg Ala Asp Arg ThrAsp Ile Gln Ser His Leu Phe Asp 770 775 780 Ala Ile Gln Pro Ser Arg TyrVal Val Gly Ser Gly Tyr Asp Gln Pro 785 790 795 800 Glu Gly Lys Trp GlyVal Asn Gly Met Leu Thr Tyr Ser Lys Ala Lys 805 810 815 Glu Ile Thr GluLeu Leu Gly Ser Arg Ala Leu Leu Asn Gly Asn Ser 820 825 830 Arg Asn ThrLys Ala Thr Ser Arg Arg Thr Arg Pro Trp Tyr Ile Val 835 840 845 Asp ValSer Gly Tyr Tyr Thr Val Lys Lys His Phe Thr Leu Arg Ala 850 855 860 GlyVal Tyr Asn Leu Leu Asn His Arg Tyr Val Thr Trp Glu Asn Val 865 870 875880 Arg Gln Thr Ala Ala Gly Ala Val Asn Gln His Lys Asn Val Gly Val 885890 895 Tyr Asn Arg Tyr Ala Ala Pro Gly Arg Asn Tyr Thr Phe Ser Leu Glu900 905 910 Met Lys Phe 915 657 amino acids amino acid single linear 25Met Lys Ser Val Pro Leu Ile Ser Gly Gly Leu Ser Phe Leu Leu Ser 1 5 1015 Ala Cys Ser Gly Gly Gly Ser Phe Asp Val Asp Asn Val Ser Asn Thr 20 2530 Pro Ser Ser Lys Pro Arg Tyr Gln Asp Asp Thr Ser Asn Gln Arg Lys 35 4045 Lys Ser Asn Leu Lys Lys Leu Phe Ile Ser Leu Gly Tyr Gly Met Lys 50 5560 Leu Val Ala Gln Asn Leu Arg Gly Asn Lys Glu Pro Ser Phe Leu Asn 65 7075 80 Glu Asp Asp Tyr Ile Ser Tyr Phe Ser Ser Leu Ser Thr Ile Glu Lys 8590 95 Asp Val Lys Asp Asn Lys Asn Gly Ala Asp Leu Ile Gly Ser Ile Asp100 105 110 Glu Pro Ser Thr Thr Asn Pro Pro Glu Lys His His Gly Gln LysTyr 115 120 125 Val Tyr Ser Gly Leu Tyr Tyr Thr Pro Ser Trp Ser Leu AsnAsp Ser 130 135 140 Lys Asn Lys Phe Tyr Leu Gly Tyr Tyr Gly Tyr Ala PheTyr Tyr Gly 145 150 155 160 Asn Lys Thr Ala Thr Asn Leu Pro Val Asn GlyVal Val Lys Tyr Lys 165 170 175 Gly Thr Trp Asp Phe Ile Thr Ala Thr LysAsn Gly Lys Arg Tyr Pro 180 185 190 Leu Leu Ser Asn Gly Gly Ser His AlaTyr Tyr Arg Arg Ser Ala Ile 195 200 205 Pro Glu Asp Ile Asp Leu Glu AsnAsp Ser Lys Asn Gly Asp Ile Gly 210 215 220 Leu Ile Ser Glu Phe Ser AlaAsp Phe Gly Thr Lys Lys Leu Thr Gly 225 230 235 240 Gln Leu Ser Tyr ThrLys Arg Lys Thr Asn Asn Gln Pro Tyr Glu Lys 245 250 255 Lys Lys Leu TyrAsp Ile Asp Ala Asp Ile Tyr Ser Asn Arg Phe Arg 260 265 270 Gly Thr ValLys Pro Thr Glu Lys Asp Ser Glu Glu His Pro Phe Thr 275 280 285 Ser GluGly Thr Leu Glu Gly Gly Phe Tyr Pro Asn Ala Glu Glu Leu 290 295 300 GlyGly Lys Phe Leu Ala Thr Asp Asn Arg Val Phe Gly Val Phe Ser 305 310 315320 Ala Lys Glu Thr Glu Glu Thr Lys Lys Glu Ala Leu Ser Lys Glu Thr 325330 335 Leu Ile Asp Gly Lys Leu Ile Thr Phe Ser Thr Lys Lys Thr Asp Ala340 345 350 Lys Thr Asn Ala Thr Thr Ser Thr Ala Ala Asn Thr Thr Thr AspThr 355 360 365 Thr Ala Asn Thr Ile Thr Asp Glu Lys Asn Phe Lys Thr GluAsp Ile 370 375 380 Ser Ser Phe Gly Glu Ala Asp Tyr Leu Leu Ile Asp LysTyr Pro Ile 385 390 395 400 Pro Leu Leu Pro Asp Lys Asn Thr Asn Asp PheIle Ser Ser Lys His 405 410 415 His Thr Val Gly Asn Lys Arg Tyr Lys ValGlu Ala Cys Cys Ser Asn 420 425 430 Leu Tyr Val Lys Phe Gly Met Tyr TyrGlu Asp Pro Leu Lys Glu Lys 435 440 445 Glu Thr Glu Thr Glu Thr Glu ThrGlu Lys Asp Lys Glu Lys Glu Lys 450 455 460 Glu Lys Asp Lys Asp Lys GluLys Gln Thr Ala Ala Thr Thr Asn Thr 465 470 475 480 Tyr Tyr Gln Phe LeuLeu Gly His Arg Thr Pro Lys Asp Asp Ile Pro 485 490 495 Lys Thr Gly SerAla Lys Tyr His Gly Ser Trp Phe Gly Tyr Ile Thr 500 505 510 Asp Gly LysThr Ser Tyr Ser Pro Ser Gly Asp Lys Lys Arg Asp Lys 515 520 525 Asn AlaVal Ala Glu Phe Asn Val Asp Phe Ala Glu Lys Lys Leu Thr 530 535 540 GlyGlu Leu Lys Arg His Asp Thr Gly Asn Pro Val Phe Ser Ile Glu 545 550 555560 Ala Asn Phe Asn Asn Ser Ser Asn Ala Phe Thr Gly Thr Ala Thr Ala 565570 575 Thr Asn Phe Val Ile Asp Gly Lys Asn Ser Gln Asn Lys Asn Thr Pro580 585 590 Ile Asn Ile Thr Thr Lys Val Asn Gly Ala Phe Tyr Gly Pro LysAla 595 600 605 Ser Glu Leu Gly Gly Tyr Phe Thr Tyr Asn Gly Asn Ser ThrAla Thr 610 615 620 Asn Ser Glu Ser Ser Ser Thr Val Ser Ser Ser Ser AsnSer Lys Asn 625 630 635 640 Ala Arg Ala Ala Val Val Phe Gly Ala Arg GlnGln Val Glu Thr Thr 645 650 655 Lys 601 amino acids amino acid singlelinear 26 Met Asn Asn Pro Leu Val Asn Gln Ala Ala Met Val Leu Pro ValPhe 1 5 10 15 Leu Leu Ser Ala Cys Leu Gly Gly Gly Gly Ser Phe Asp LeuAsp Ser 20 25 30 Val Glu Thr Val Gln Asp Met His Ser Lys Pro Lys Tyr GluAsp Glu 35 40 45 Lys Ser Gln Pro Glu Ser Gln Gln Asp Val Ser Glu Asn SerGly Ala 50 55 60 Ala Tyr Gly Phe Ala Val Lys Leu Pro Arg Arg Asn Ala HisPhe Asn 65 70 75 80 Pro Lys Tyr Lys Glu Lys His Lys Pro Leu Gly Ser MetAsp Trp Lys 85 90 95 Lys Leu Gln Arg Gly Glu Pro Asn Ser Phe Ser Glu ArgAsp Glu Leu 100 105 110 Glu Lys Lys Arg Gly Ser Ser Glu Leu Ile Glu SerLys Trp Glu Asp 115 120 125 Gly Gln Ser Arg Val Val Gly Tyr Thr Asn PheThr Tyr Val Arg Ser 130 135 140 Gly Tyr Val Tyr Leu Asn Lys Asn Asn IleAsp Ile Lys Asn Asn Ile 145 150 155 160 Val Leu Phe Gly Pro Asp Gly TyrLeu Tyr Tyr Lys Gly Lys Glu Pro 165 170 175 Ser Lys Glu Leu Pro Ser GluLys Ile Thr Tyr Lys Gly Thr Trp Asp 180 185 190 Tyr Val Thr Asp Ala MetGlu Lys Gln Arg Phe Glu Gly Leu Gly Ser 195 200 205 Ala Ala Gly Gly AspLys Ser Gly Ala Leu Ser Ala Leu Glu Glu Gly 210 215 220 Val Leu Arg AsnGln Ala Glu Ala Ser Ser Gly His Thr Asp Phe Gly 225 230 235 240 Met ThrSer Glu Phe Glu Val Asp Phe Ser Asp Lys Thr Ile Lys Gly 245 250 255 ThrLeu Tyr Arg Asn Asn Arg Ile Thr Gln Asn Asn Ser Glu Asn Lys 260 265 270Gln Ile Lys Thr Thr Arg Tyr Thr Ile Gln Ala Thr Leu His Gly Asn 275 280285 Arg Phe Lys Gly Lys Ala Leu Ala Ala Asp Lys Gly Ala Thr Asn Gly 290295 300 Ser His Pro Phe Ile Ser Asp Ser Asp Ser Leu Glu Gly Gly Phe Tyr305 310 315 320 Gly Pro Lys Gly Glu Glu Leu Ala Gly Lys Phe Leu Ser AsnAsp Asn 325 330 335 Lys Val Ala Ala Val Phe Gly Ala Lys Gln Lys Asp LysLys Asp Gly 340 345 350 Glu Asn Ala Ala Gly Pro Ala Thr Glu Thr Val IleAsp Ala Tyr Arg 355 360 365 Ile Thr Gly Glu Glu Phe Lys Lys Glu Gln IleAsp Ser Phe Gly Asp 370 375 380 Val Lys Lys Leu Leu Val Asp Gly Val GluLeu Ser Leu Leu Pro Ser 385 390 395 400 Glu Gly Asn Lys Ala Ala Phe GlnHis Glu Ile Glu Gln Asn Gly Val 405 410 415 Lys Ala Thr Val Cys Cys SerAsn Leu Asp Tyr Met Ser Phe Gly Lys 420 425 430 Leu Ser Lys Glu Asn LysAsp Asp Met Phe Leu Gln Gly Val Arg Thr 435 440 445 Pro Val Ser Asp ValAla Ala Arg Thr Glu Ala Asn Ala Lys Tyr Arg 450 455 460 Gly Thr Trp TyrGly Tyr Ile Ala Asn Gly Thr Ser Trp Ser Gly Glu 465 470 475 480 Ala SerAsn Gln Phe Thr Glu Gly Gly Asn Arg Ala Glu Phe Asp Val 485 490 495 AspPhe Ser Thr Lys Lys Ile Ser Gly Thr Leu Thr Ala Lys Asp Arg 500 505 510Thr Ser Pro Ala Phe Thr Ile Thr Ala Met Ile Lys Asp Asn Gly Phe 515 520525 Ser Gly Val Ala Lys Thr Gly Glu Asn Gly Phe Ala Leu Asp Pro Gln 530535 540 Asn Thr Gly Asn Ser His Tyr Thr His Ile Glu Ala Thr Val Ser Gly545 550 555 560 Gly Phe Tyr Gly Lys Asn Ala Ile Glu Met Gly Gly Ser PheSer Phe 565 570 575 Pro Gly Asn Ala Pro Glu Gly Lys Gln Glu Lys Ala SerVal Val Phe 580 585 590 Gly Ala Lys Arg Gln Gln Leu Val Gln 595 600 711amino acids amino acid single linear 27 Met Asn Asn Pro Leu Val Asn GlnAla Ala Met Val Leu Pro Val Phe 1 5 10 15 Leu Leu Ser Ala Cys Leu GlyGly Gly Gly Ser Phe Asp Leu Asp Ser 20 25 30 Val Asp Thr Glu Ala Pro ArgPro Ala Pro Lys Tyr Gln Asp Val Ser 35 40 45 Ser Glu Lys Pro Gln Ala GlnGln Asp Gln Gly Gly Tyr Gly Phe Ala 50 55 60 Met Arg Leu Lys Arg Arg AsnTrp Tyr Pro Gly Ala Glu Glu Ser Glu 65 70 75 80 Val Lys Leu Asn Glu SerAsp Trp Glu Ala Thr Gly Leu Pro Thr Lys 85 90 95 Pro Lys Glu Leu Pro LysArg Gln Lys Ser Val Ile Glu Lys Val Glu 100 105 110 Thr Asp Gly Asp SerAsp Ile Tyr Ser Ser Pro Tyr Leu Thr Pro Ser 115 120 125 Asn His Gln AsnGly Ser Ala Gly Asn Gly Val Asn Gln Pro Lys Asn 130 135 140 Gln Ala ThrGly His Glu Asn Phe Gln Tyr Val Tyr Ser Gly Trp Phe 145 150 155 160 TyrHis Ala Ala Ser Glu Lys Asp Phe Ser Asn Lys Lys Ile Trp Lys 165 170 175Ser Gly Asp Asp Gly Tyr Ile Phe Tyr His Gly Glu Lys Pro Ser Arg 180 185190 Gln Leu Pro Ala Ser Gly Lys Val Ile Tyr Lys Gly Val Trp His Phe 195200 205 Val Thr Asp Thr Lys Lys Gly Gln Asp Phe Arg Glu Ile Ile Gln Pro210 215 220 Ser Lys Lys Gln Gly Asp Arg Tyr Ser Gly Phe Ser Gly Asp GlySer 225 230 235 240 Glu Glu Tyr Ser Asn Lys Asn Glu Ser Thr Leu Lys AspAsp His Glu 245 250 255 Gly Tyr Gly Phe Thr Ser Asn Leu Glu Val Asp PheGly Asn Lys Lys 260 265 270 Leu Thr Gly Lys Leu Ile Arg Asn Asn Ala SerLeu Asn Asn Asn Thr 275 280 285 Asn Asn Asp Lys His Thr Thr Gln Tyr TyrSer Leu Asp Ala Gln Ile 290 295 300 Thr Gly Gly Asn Pro Phe Asn Gly ThrAla Thr Ala Thr Asp Lys Lys 305 310 315 320 Glu Asn Glu Thr Lys Leu HisPro Phe Val Ser Asp Ser Ser Ser Leu 325 330 335 Glu Gly Gly Phe Phe GlyPro Gln Gly Glu Glu Leu Gly Phe Arg Phe 340 345 350 Leu Thr Asp Asp GlnLys Val Ala Val Val Gly Ser Ala Lys Thr Lys 355 360 365 Asp Lys Leu GluAsn Gly Ala Ala Ala Ser Gly Ser Gly Ala Ala Ala 370 375 380 Ser Gly GlyAla Ala Gly Thr Ser Ser Glu Asn Ser Lys Leu Thr Thr 385 390 395 400 ValLeu Asp Ala Val Glu Leu Thr Leu Asn Asp Lys Lys Ile Lys Asn 405 410 415Leu Asp Asn Phe Ser Asn Ala Ala Gln Leu Val Val Asp Gly Ile Met 420 425430 Ile Pro Leu Leu Pro Lys Asp Ser Glu Ser Gly Asn Thr Gln Ala Asp 435440 445 Lys Gly Lys Asn Gly Gly Thr Glu Phe Thr Arg Lys Phe Glu His Thr450 455 460 Pro Glu Ser Asp Lys Lys Asp Ala Gln Ala Gly Thr Gln Thr AsnGly 465 470 475 480 Ala Gln Thr Ala Ser Asn Thr Ala Gly Asp Thr Asn GlyLys Thr Lys 485 490 495 Thr Tyr Glu Val Glu Val Cys Cys Ser Asn Leu AsnTyr Leu Lys Tyr 500 505 510 Gly Met Leu Thr Arg Lys Asn Ser Lys Ser AlaMet Gln Ala Gly Gly 515 520 525 Asn Ser Ser Gln Ala Asp Ala Lys Thr GluGln Val Glu Gln Ser Met 530 535 540 Phe Leu Gln Gly Glu Arg Thr Asp GluLys Glu Ile Pro Thr Asp Gln 545 550 555 560 Asn Val Val Tyr Arg Gly SerTrp Tyr Gly His Ile Ala Asn Gly Thr 565 570 575 Ser Trp Ser Gly Asn AlaSer Asp Lys Glu Gly Gly Asn Arg Ala Asp 580 585 590 Phe Thr Ile Asn PheAla Asp Lys Lys Ile Thr Gly Lys Leu Thr Ala 595 600 605 Glu Asn Arg ThrAla Gln Thr Phe Thr Ile Glu Gly Met Ile Gln Gly 610 615 620 Asn Gly PheGlu Gly Thr Ala Lys Thr Ala Glu Ser Gly Phe Asp Leu 625 630 635 640 AspGln Lys Asn Thr Thr Arg Thr Pro Lys Ala Tyr Ile Thr Asp Ala 645 650 655Lys Val Lys Gly Gly Phe Tyr Gly Pro Lys Ala Glu Glu Leu Gly Gly 660 665670 Trp Phe Ala Tyr Pro Gly Asp Lys Gln Thr Glu Lys Ala Thr Ala Thr 675680 685 Ser Ser Asp Gly Asn Ser Ala Ser Ser Ala Thr Val Val Phe Gly Ala690 695 700 Lys Arg Gln Gln Pro Val Gln 705 710 708 amino acids aminoacid single linear 28 Met Asn Asn Pro Leu Val Asn Gln Ala Ala Met ValLeu Pro Val Phe 1 5 10 15 Leu Leu Ser Ala Cys Leu Gly Gly Gly Gly SerPhe Asp Leu Asp Ser 20 25 30 Val Asp Thr Glu Ala Pro Arg Pro Ala Pro LysTyr Gln Asp Val Ser 35 40 45 Ser Glu Lys Pro Gln Ala Gln Lys Asp Gln GlyGly Tyr Gly Phe Ala 50 55 60 Met Arg Phe Lys Arg Arg Asn Trp His Pro SerAla Asn Pro Lys Glu 65 70 75 80 Asp Glu Val Lys Leu Lys Asn Asp Asp TrpGlu Ala Thr Gly Leu Pro 85 90 95 Thr Glu Pro Lys Lys Leu Pro Leu Lys GlnGln Ser Val Ile Ser Glu 100 105 110 Val Glu Thr Asn Gly Asn Ser Lys MetTyr Thr Ser Pro Tyr Leu Ser 115 120 125 Gln Asp Ala Asp Ser Ser His AlaAsn Gly Ala Asn Gln Pro Lys Asn 130 135 140 Glu Val Thr Asp Tyr Lys LysPhe Lys Tyr Val Tyr Ser Gly Trp Phe 145 150 155 160 Tyr Lys His Ala LysSer Glu Val Lys Asn Glu Asn Gly Leu Val Ser 165 170 175 Ala Lys Arg GlyAsp Asp Gly Tyr Ile Phe Tyr His Gly Asp Lys Pro 180 185 190 Ser Arg GlnLeu Pro Ala Ser Glu Ala Val Thr Tyr Lys Gly Val Trp 195 200 205 His PheVal Thr Asp Thr Lys Gln Gly Gln Lys Phe Asn Asp Ile Leu 210 215 220 GluThr Ser Lys Gly Gln Gly Asp Lys Tyr Ser Gly Phe Ser Gly Asp 225 230 235240 Glu Gly Glu Thr Thr Ser Asn Arg Thr Asp Ser Asn Leu Asn Asp Lys 245250 255 His Glu Gly Tyr Gly Phe Thr Ser Asn Phe Lys Val Asp Phe Asn Asn260 265 270 Lys Lys Leu Thr Gly Lys Leu Ile Arg Asn Asn Lys Val Ile AsnThr 275 280 285 Ala Ala Ser Asp Gly Tyr Thr Thr Glu Tyr Tyr Ser Leu AspAla Thr 290 295 300 Leu Arg Gly Asn Arg Phe Ser Gly Lys Ala Ile Ala ThrAsp Lys Pro 305 310 315 320 Asn Thr Gly Gly Thr Lys Leu His Pro Phe ValPhe Asp Ser Ser Ser 325 330 335 Leu Ser Gly Gly Phe Phe Gly Pro Gln GlyGlu Glu Leu Gly Phe Arg 340 345 350 Phe Leu Ser Asp Asp Gly Lys Val AlaVal Val Gly Ser Ala Lys Thr 355 360 365 Lys Asp Ser Thr Ala Asn Gly AsnAla Pro Ala Ala Ser Ser Gly Pro 370 375 380 Gly Ala Ala Thr Met Pro SerGlu Thr Arg Leu Thr Thr Val Leu Asp 385 390 395 400 Ala Val Glu Leu ThrPro Asp Gly Lys Glu Ile Lys Asn Leu Asp Asn 405 410 415 Phe Ser Asn AlaThr Arg Leu Val Val Asp Gly Ile Met Ile Pro Leu 420 425 430 Leu Pro ThrGlu Ser Gly Asn Gly Gln Ala Asp Lys Gly Lys Asn Gly 435 440 445 Gly ThrAsp Phe Thr Tyr Glu Thr Thr Tyr Thr Pro Glu Ser Asp Lys 450 455 460 LysAsp Thr Lys Ala Gln Thr Gly Ala Gly Gly Met Gln Thr Ala Ser 465 470 475480 Gly Thr Ala Thr Val Asn Gly Gly Gln Val Gly Thr Lys Thr Tyr Lys 485490 495 Val Gln Val Cys Cys Ser Asn Leu Asn Tyr Leu Lys Tyr Gly Leu Leu500 505 510 Thr Arg Glu Asn Asn Asn Ser Val Met Gln Ala Val Lys Asn SerSer 515 520 525 Gln Ala Asp Ala Lys Thr Lys Gln Ile Glu Gln Ser Met PheLeu Gln 530 535 540 Gly Glu Arg Thr Asp Glu Asn Lys Ile Pro Gln Glu GlnGly Ile Val 545 550 555 560 Tyr Arg Gly Phe Trp Tyr Gly Arg Ile Ala AsnGly Thr Ser Trp Ser 565 570 575 Gly Lys Ala Ser Asn Ala Thr Asp Gly AsnArg Ala Lys Phe Thr Val 580 585 590 Asn Gly Asp Arg Lys Glu Ile Thr GlyThr Leu Thr Ala Glu Asn Arg 595 600 605 Ser Glu Ala Thr Phe Thr Ile AspAla Met Ile Glu Gly Asn Gly Phe 610 615 620 Lys Gly Thr Ala Lys Thr GlyAsn Asp Gly Phe Ala Pro Asp Gln Asn 625 630 635 640 Asn Ser Thr Val ThrHis Lys Val His Ile Ala Asn Ala Glu Val Gln 645 650 655 Gly Gly Phe TyrGly Pro Asn Ala Glu Glu Leu Gly Gly Trp Phe Ala 660 665 670 Tyr Pro GlyAsn Glu Gln Thr Lys Asn Ala Thr Val Glu Ser Gly Asn 675 680 685 Gly AsnSer Ala Ser Ser Ala Thr Val Val Phe Gly Ala Lys Arg Gln 690 695 700 LysLeu Val Lys 705 280 base pairs nucleic acid single linear 29 AGCCAACGAAGTTACAGGGC TTGGTAAGGT GGTCAAAACT GCCGAGACCA TCAATAAAGA 60 ACAAGTGCTAAACATTCGAG ACTTAACACG CTATGACCCT GGCATTGCTG TGGTTGAGCA 120 AGGTCGTGGGGCAAGCTCAG GCTATTCTAT TCGTGGTATG GATAAAAATC GTGTGGCGGT 180 ATTGGTTGATGGCATCAATC AAGCCCAGCA CTATGCCCTA CAAGGCCCTG TGGCAGGCAA 240 AAATTATGCCGCAGGTGGGG CAATCAACGA AATAGAATAC 280 7 amino acids amino acid singlelinear 30 Glu Gly Gly Phe Tyr Gly Pro 1 5 10 amino acids amino acidsingle linear 31 Ile Arg Asp Leu Thr Arg Tyr Asp Pro Gly 1 5 10 30 basepairs nucleic acid single linear 32 ATTCGAGACT TAACACGCTA TGACCCTGGC 3030 base pairs nucleic acid single linear 33 ATTCGTGATT TAACTCGCTATGACCCTGGT 30 32 base pairs nucleic acid single linear 34 TCGACGGTATCGATGGCCTT AGGGGCCTAG GA 32 32 base pairs nucleic acid single linear 35GCCATAGCTA CCGGAATCCC CGGATCCTTC GA 32 56 base pairs nucleic acid singlelinear 36 TATGTGTGGT GGCAGTGGTG GTTCAAATCC ACCTGCTCCT ACGCCCATTC CAAATG56 58 base pairs nucleic acid single linear 37 ACACACCACC GTCACCACCAAGTTTAGGTG GACGAGGATG CGGGTAAGGT TTACGATC 58 104 base pairs nucleic acidsingle linear 38 GTCCAAATGC AAACGAGATG GGCGGGTCAT TTACACACAA CGCCGATGACAGCAAAGCCT 60 CTGTGGTCTT TGGCACAAAA AGACAACAAG AAGTTAAGTA GTAG 104 105base pairs nucleic acid single linear 39 GTTTACGTTT GCTCTACCCGCCCAGTAAAT GTGTGTTGCG GCTACTGTCG TTTCGGAGAC 60 ACCAGAAACC GTGTTTTTCTGTTGTTCTTC AATTCATCAT CCTAG 105 113 base pairs nucleic acid singlelinear 40 TATGAAACAC ATTCCTTTAA CCACACTGTG TGTGGCAATC TCTGCCGTCTTATTAACCGC 60 TTGTGGTGGC AGTGGTGGTT CAAATCCACC TGCTCCTACG CCCATTCCAA ATG113 115 base pairs nucleic acid single linear 41 ACTTTGTGTA AGGAAATTGGTGTGACACAC ACCGTTAGAG ACGGCAGAAT AATTGGCGAA 60 CACCACCGTC ACCACCAAGTTTAGGTGGAC GAGGATGCGG GTAAGGTTTA CGATC 115 40 base pairs nucleic acidsingle linear 42 GAATTCCATA TGTGTGGTGG GAGCTCTGGT GGTTTCAATC 40 30 basepairs nucleic acid single linear 43 CCCATGGCAG GTTCTTGAAT GCCTGAAACT 3030 base pairs nucleic acid single linear 44 GAATTCCATA TGAAACACATTCCTTTAACC 30 2121 base pairs nucleic acid single linear 45 ATGAAACACATTCCTTTAAC CACACTGTGT GTGGCAATCT CTGCCGTCTT ATTAACCGCT 60 TGTGGTGGCAGTGGTGGTTC AAATCCACCT GCTCCTACGC CCATTCCAAA TGCTAGCGGT 120 TCAGGTAATACTGGCAACAC TGGTAATGCT GGCGGTACTG ATAATACAGC CAATGCAGGT 180 AATACAGGCGGTACAAACTC TGGTACAGGC AGTGCCAACA CACCAGAACC AAAATATAAA 240 GATGTGCCAACCGATGAAAA TAAAAAAGAT GAAGTGTCAG GCATTCAAGA ACCTGCCATG 300 GGTTATGGCATGGCTTTGAG TAAAATGAAT CTACACAAAC AACAAGACAC GCCATTAGAT 360 GAAAAAGATATCATTACCTT AGACGGTAAA AAACAAGTTG CAAAAGGTGA AAAATCGCCA 420 TTGCCATTTTCGTTGGATGT AGAAAATAAA TTGCTTGATG GCTATATAGC AAAAATGAAT 480 GAAGCGGATAAAAATGCCAT TGGTGACAGA ATTAAGAAAG ATAATAAAGA CAAGTCATTA 540 TCTAAAGCAGAGCTTGCCAA ACAAATCAAA GAAGATGTGC GTAAAAGCCA TGAGTTTCAG 600 CAAGTATTATCATCACTGAA AAACAAAATT TTTCATTCAA ATGATGGAAC AACCAAAGCA 660 ACCACACGAGATTTACAATA TGTTGATTAT GGTTACTACT TGGTGAATGA TGGCAATTAT 720 CTAACCGTCAAAACAGACGA ACTTTGGAAT TTAGGCCCTG TGGGCGGTGT GTTTTATAAT 780 GGCACAACGACCGCCAAAGA GCTACCCACA CAAGATGCGG TCAAATATAA AGGACATTGG 840 GACTTTATGACCGATGTTGC CAAACAAAGA AACCGATTTA GCGAAGTGAA AGAAAACCTT 900 CAAGCAGGTCGGTATTATGG AGCATCTTCA AAAGATGAAT ACAACCGCTT ATTAACTGAT 960 GAGAAAAACAAACCAGAGCG TTATAACGGT GAATATGGTC ATAGCAGTGA GTTTACTGTT 1020 AATTTTAAGGACAAAAAATT AACAGGTGAG CTGTTTAGTA ACCTACAAGA CAGCCGTAAG 1080 GGCAATGTTACGAAAACCAA ACGCTATGAC ATCGATGCCA ATATCTACGG CAACCGCTTC 1140 CGTGGCAGTGCCACCGCAAG CGATAAAGCA GAAGCAAGCA AAACCAAACA CCCCTTTACC 1200 AGCGATGCCAAAAATAGCCT AGAAGGCGGT TTTTATGGAC CAAACGCCGA GGAGCTGGCA 1260 GGTAAATTCCTAACCAATGA CAACAAACTC TTTGGCGTCT TTGGTGCTAA ACGAGAGAGT 1320 AAAGCTGGGGAAAAAACCGA AGCCATCTTA GATGCCTATG CACTTGGGAC ATTTAACAAA 1380 AATAACGCAACCACATTCAC CCCATTTACC AAAAAACAAC TGGATAACTT TGGCAATGCC 1440 AAAAAGTTGGTCTTGGGTTC TACCGTCATT GATTTGGTGC CTACCGGTGT CACCAAAGAT 1500 GTCAATGAATTCACCAAAAA CAAGCCAGAT TCTGCCACAA ACAAAGCGGG CGAGACTTTG 1560 ATGGTGAATGATAAAGTTAG CGTCAAAACC TATGGCTATG GCAGAAACTT TGAATACCTA 1620 AAATTTGGTGAGCTCAGTGT CGGCACAAGC AACAGCGTCT TTTTACAAGG CGAACGCACC 1680 GCTACCACAGGCGAGAAAGC CGTACCAACC AAAGGCACAG CCAAATATTT GGGGAACTGG 1740 GTAGGATACATCACAGGAAA GGACTCATCA AAAAGCTTTA ATGAGGCCCA AGATGTTGCT 1800 GATTTTGACATTGACTTTGA GAAAAAATCA GTTAAAGGCA AACTGACCAC CAAAGACCGC 1860 CAAGACCCTGTATTTAACAT CACAGGTGAC ATCGCAGGCA ATGGCTGGAC AGGCAAAGCC 1920 AGCACCACCAAAGCGGACGC AGGGGGCTAC AAGATAGATT CTAGCAGTAC AGGCAAATCC 1980 ATCGTCATCAAAGATGCCGA GGTTACAGGG GGCTTTTATG GTCCAAATGC AAACGAGATG 2040 GGCGGGTCATTTACACACAA CACCGATGAC AGTAAAGCCT CTGTGGTCTT TGGCACAAAA 2100 AGACAAGAAGAAGTTAAGTA G 2121 706 amino acids amino acid linear 46 Met Lys His IlePro Leu Thr Thr Leu Cys Val Ala Ile Ser Ala Val 1 5 10 15 Leu Leu ThrAla Cys Gly Gly Ser Gly Gly Ser Asn Pro Pro Ala Pro 20 25 30 Thr Pro IlePro Asn Ala Ser Gly Ser Gly Asn Thr Gly Asn Thr Gly 35 40 45 Asn Ala GlyGly Thr Asp Asn Thr Ala Asn Ala Gly Asn Thr Gly Gly 50 55 60 Thr Asn SerGly Thr Gly Ser Ala Asn Thr Pro Glu Pro Lys Tyr Lys 65 70 75 80 Asp ValPro Thr Asp Glu Asn Lys Lys Asp Glu Val Ser Gly Ile Gln 85 90 95 Glu ProAla Met Gly Tyr Gly Met Ala Leu Ser Lys Met Asn Leu His 100 105 110 LysGln Gln Asp Thr Pro Leu Asp Glu Lys Asp Ile Ile Thr Leu Asp 115 120 125Gly Lys Lys Gln Val Ala Lys Gly Glu Lys Ser Pro Leu Pro Phe Ser 130 135140 Leu Asp Val Glu Asn Lys Leu Leu Asp Gly Tyr Ile Ala Lys Met Asn 145150 155 160 Glu Ala Asp Lys Asn Ala Ile Gly Asp Arg Ile Lys Lys Asp AsnLys 165 170 175 Asp Lys Ser Leu Ser Lys Ala Glu Leu Ala Lys Gln Ile LysGlu Asp 180 185 190 Val Arg Lys Ser His Glu Phe Gln Gln Val Leu Ser SerLeu Lys Asn 195 200 205 Lys Ile Phe His Ser Asn Asp Gly Thr Thr Lys AlaThr Thr Arg Asp 210 215 220 Leu Gln Tyr Val Asp Tyr Gly Tyr Tyr Leu ValAsn Asp Gly Asn Tyr 225 230 235 240 Leu Thr Val Lys Thr Asp Glu Leu TrpAsn Leu Gly Pro Val Gly Gly 245 250 255 Val Phe Tyr Asn Gly Thr Thr ThrAla Lys Glu Leu Pro Thr Gln Asp 260 265 270 Ala Val Lys Tyr Lys Gly HisTrp Asp Phe Met Thr Asp Val Ala Lys 275 280 285 Gln Arg Asn Arg Phe SerGlu Val Lys Glu Asn Leu Gln Ala Gly Arg 290 295 300 Tyr Tyr Gly Ala SerSer Lys Asp Glu Tyr Asn Arg Leu Leu Thr Asp 305 310 315 320 Glu Lys AsnLys Pro Glu Arg Tyr Asn Gly Glu Tyr Gly His Ser Ser 325 330 335 Glu PheThr Val Asn Phe Lys Asp Lys Lys Leu Thr Gly Glu Leu Phe 340 345 350 SerAsn Leu Gln Asp Ser Arg Lys Gly Asn Val Thr Lys Thr Lys Arg 355 360 365Tyr Asp Ile Asp Ala Asn Ile Tyr Gly Asn Arg Phe Arg Gly Ser Ala 370 375380 Thr Ala Ser Asp Lys Ala Glu Ala Ser Lys Thr Lys His Pro Phe Thr 385390 395 400 Ser Asp Ala Lys Asn Ser Leu Glu Gly Gly Phe Tyr Gly Pro AsnAla 405 410 415 Glu Glu Leu Ala Gly Lys Phe Leu Thr Asn Asp Asn Lys LeuPhe Gly 420 425 430 Val Phe Gly Ala Lys Arg Glu Ser Lys Ala Gly Glu LysThr Glu Ala 435 440 445 Ile Leu Asp Ala Tyr Ala Leu Gly Thr Phe Asn LysAsn Asn Ala Thr 450 455 460 Thr Phe Thr Pro Phe Thr Lys Lys Gln Leu AspAsn Phe Gly Asn Ala 465 470 475 480 Lys Lys Leu Val Leu Gly Ser Thr ValIle Asp Leu Val Pro Thr Gly 485 490 495 Val Thr Lys Asp Val Asn Glu PheThr Lys Asn Lys Pro Asp Ser Ala 500 505 510 Thr Asn Lys Ala Gly Glu ThrLeu Met Val Asn Asp Lys Val Ser Val 515 520 525 Lys Thr Tyr Gly Tyr GlyArg Asn Phe Glu Tyr Leu Lys Phe Gly Glu 530 535 540 Leu Ser Val Gly ThrSer Asn Ser Val Phe Leu Gln Gly Glu Arg Thr 545 550 555 560 Ala Thr ThrGly Glu Lys Ala Val Pro Thr Lys Gly Thr Ala Lys Tyr 565 570 575 Leu GlyAsn Trp Val Gly Tyr Ile Thr Gly Lys Asp Ser Ser Lys Ser 580 585 590 PheAsn Glu Ala Gln Asp Val Ala Asp Phe Asp Ile Asp Phe Glu Lys 595 600 605Lys Ser Val Lys Gly Lys Leu Thr Thr Lys Asp Arg Gln Asp Pro Val 610 615620 Phe Asn Ile Thr Gly Asp Ile Ala Gly Asn Gly Trp Thr Gly Lys Ala 625630 635 640 Ser Thr Thr Lys Ala Asp Ala Gly Gly Tyr Lys Ile Asp Ser SerSer 645 650 655 Thr Gly Lys Ser Ile Val Ile Lys Asp Ala Glu Val Thr GlyGly Phe 660 665 670 Tyr Gly Pro Asn Ala Asn Glu Met Gly Gly Ser Phe ThrHis Asn Thr 675 680 685 Asp Asp Ser Lys Ala Ser Val Val Phe Gly Thr LysArg Gln Glu Glu 690 695 700 Val Lys 705 2287 base pairs nucleic acidsingle linear 47 AAATTTGCCG TATTTTGTCT ATCATAAATG CATTTATCAT CAATGCCCAAACAAATACGC 60 CAAATGCACA TTGTCAGCAT GCCAAAATAG GCATTAACAG ACTTTTTTAGATAATACCAT 120 CAACCCATCA GAGGATTATT TTATGAAACA CATTCCTTTA ACCACACTGTGTGTGGCAAT 180 CTCTGCCGTC TTATTAACCG CTTGTGGTGG CAGTGGTGGT TCAAATCCACCTGCTCCTAC 240 GCCCATTCCA AATGCTAGCG GTTCAGGTAA TACTGGCAAC ACTGGTAATGCTGGCGGTAC 300 TGATAATACA GCCAATGCAG GTAATACAGG CGGTACAAGC TCTGGTACAGGCAGTGCCAG 360 CACGTCAGAA CCAAAATATC AAGATGTGCC AACAACGCCC AATAACAAAGAACAAGTTTC 420 ATCCATTCAA GAACCTGCCA TGGGTTATGG CATGGCTTTG AGTAAAATTAATCTATACGA 480 CCAACAAGAC ACGCCATTAG ATGCAAAAAA TATCATTACC TTAGACGGTAAAAAACAAGT 540 TGCTGACAAT CAAAAATCAC CATTGCCATT TTCGTTAGAT GTAGAAAATAAATTGCTTGA 600 TGGCTATATA GCAAAAATGA ATGAAGCGGA TAAAAATGCC ATTGGTGAAAGAATTAAGAG 660 AGAAAATGAA CAAAATAAAA AAATATCCGA TGAAGAACTT GCCAAAAAAATCAAAGAAAA 720 TGTGCGTAAA AGCCCTGAGT TTCAGCAAGT ATTATCATCG ATAAAAGCGAAAACTTTCCA 780 TTCAAATGAC AAAACAACCA AAGCAACCAC ACGAGATTTA AAATATGTTGATTATGGTTA 840 CTACTTGGTG AATGATGCCA ATTATCTAAC CGTCAAAACA GACAAACCAAAACTTTGGAA 900 TTCAGGTCCT GTGGGCGGTG TGTTTTATAA TGGCTCAACG ACCGCCAAAGAGCTGCCCAC 960 ACAAGATGCG GTCAAATATA AAGGACATTG GGACTTTATG ACCGATGTTGCCAAAAAAAG 1020 AAACCGATTT AGCGAAGTAA AAGAAACCTA TCAAGCAGGC TGGTGGTATGGGGCATCTTC 1080 AAAAGATGAA TACAACCGCT TATTAACCAA AGCAGATGCC GCACCTGATAATTATAGCGG 1140 TGAATATGGT CATAGCAGTG AATTTACTGT TAATTTTAAG GAAAAAAAATTAACAGGTGA 1200 GCTGTTTAGT AACCTACAAG ACAGCCATAA ACAAAAAGTA ACCAAAACAAAACGCTATGA 1260 TATTAAGGCT GATATCCACG GCAACCGCTT CCGTGGCAGT GCCACCGCAACGGATAAGGC 1320 AGAAGACAGC AAAAGCAAAC ACCCCTTTAC CAGCCATGCC AAAGATAAGCTAGAAGGTGG 1380 TTTTTATGGA CCAAAAGGCG AGGAGCTGGC AGGTAAATTC TTAACCGATGATAACAAACT 1440 CTTTGGTGTC TTTGGTGCCA AACAAGAGGG TAATGTAGAA AAAACCGAAGCCATCTTAGA 1500 TGCTTATGCA CTTGGGACAT TTAATAAACC TGGTACGACC AATCCCGCCTTTACCGCTAA 1560 CAGCAAAAAA GAACTGGATA ACTTTGGCAA TGCCAAAAAG TTGGTCTTGGGTTCTACCGT 1620 CATTGATTTG GTGCCTACTG ATGCCACCAA AGATGTCAAT GAATTCAAAGAAAAGCCAAA 1680 GTCTGCCACA AACAAAGCGG GCGAAACTTT GATGGTGAAT GATGAAGTTAGCGTCAAAAC 1740 CTATGGCAAA AACTTTGAAT ACCTAAAATT TGGTGAGCTT AGTGTCGGTGGTAGCCATAG 1800 CGTCTTTTTA CAAGGCGAAC GCACCGCTAC CACAGGCGAG AAAGCCGTACCAACCACAGG 1860 CAAAGCCAAA TATTTGGGGA ACTGGGTAGG ATATATCACA GGAGCGGACTCATCAAAAGG 1920 CTCTACCGAT GGCAAAGGCT TTACCGATGC CAAAGATATT GCTGATTTTGACATTGACTT 1980 TGAGAAAAAA TCAGTTAATG GCAAACTGAC CACCAAAGAC CGCCAAGACCCTGTCTTTAA 2040 CATCACAGGT GAAATCGCAG GCAATGGCTG GACAGGTAAA GCCAGCACCGCCGAAGCGAA 2100 CGCAGGGGGC TATAAGATAG ATTCTAGCAG TACAGGCAAA TCCATCGTCATCAAAGATGC 2160 CGTGGTTACA GGTGGCTTTT ATGGTCCAAA TGCAACCGAG ATGGGTGGGTCATTTACACA 2220 CAACAGCGGT AATGATGGTA AAGTCTCTGT GGTCTTTGGC ACAAAAAAACAAGAAGTTAA 2280 GAAGTGA 2287 2145 base pairs nucleic acid single linear48 ATGAAACACA TTCCTTTAAC CACACTGTGT GTGGCAATCT CTGCCGTCTT ATTAACCGCT 60TGTGGTGGCA GTGGTGGTTC AAATCCACCT GCTCCTACGC CCATTCCAAA TGCTAGCGGT 120TCAGGTAATA CTGGCAACAC TGGTAATGCT GGCGGTACTG ATAATACAGC CAATGCAGGT 180AATACAGGCG GTACAAGCTC TGGTACAGGC AGTGCCAGCA CGTCAGAACC AAAATATCAA 240GATGTGCCAA CAACGCCCAA TAACAAAGAA CAAGTTTCAT CCATTCAAGA ACCTGCCATG 300GGTTATGGCA TGGCTTTGAG TAAAATTAAT CTATACGACC AACAAGACAC GCCATTAGAT 360GCAAAAAATA TCATTACCTT AGACGGTAAA AAACAAGTTG CTGACAATCA AAAATCACCA 420TTGCCATTTT CGTTAGATGT AGAAAATAAA TTGCTTGATG GCTATATAGC AAAAATGAAT 480GAAGCGGATA AAAATGCCAT TGGTGAAAGA ATTAAGAGAG AAAATGAACA AAATAAAAAA 540ATATCCGATG AAGAACTTGC CAAAAAAATC AAAGAAAATG TGCGTAAAAG CCCTGAGTTT 600CAGCAAGTAT TATCATCGAT AAAAGCGAAA ACTTTCCATT CAAATGACAA AACAACCAAA 660GCAACCACAC GAGATTTAAA ATATGTTGAT TATGGTTACT ACTTGGTGAA TGATGCCAAT 720TATCTAACCG TCAAAACAGA CAAACCAAAA CTTTGGAATT CAGGTCCTGT GGGCGGTGTG 780TTTTATAATG GCTCAACGAC CGCCAAAGAG CTGCCCACAC AAGATGCGGT CAAATATAAA 840GGACATTGGG ACTTTATGAC CGATGTTGCC AAAAAAAGAA ACCGATTTAG CGAAGTAAAA 900GAAACCTATC AAGCAGGCTG GTGGTATGGG GCATCTTCAA AAGATGAATA CAACCGCTTA 960TTAACCAAAG CAGATGCCGC ACCTGATAAT TATAGCGGTG AATATGGTCA TAGCAGTGAA 1020TTTACTGTTA ATTTTAAGGA AAAAAAATTA ACAGGTGAGC TGTTTAGTAA CCTACAAGAC 1080AGCCATAAAC AAAAAGTAAC CAAAACAAAA CGCTATGATA TTAAGGCTGA TATCCACGGC 1140AACCGCTTCC GTGGCAGTGC CACCGCAACG GATAAGGCAG AAGACAGCAA AAGCAAACAC 1200CCCTTTACCA GCCATGCCAA AGATAAGCTA GAAGGTGGTT TTTATGGACC AAAAGGCGAG 1260GAGCTGGCAG GTAAATTCTT AACCGATGAT AACAAACTCT TTGGTGTCTT TGGTGCCAAA 1320CAAGAGGGTA ATGTAGAAAA AACCGAAGCC ATCTTAGATG CTTATGCACT TGGGACATTT 1380AATAAACCTG GTACGACCAA TCCCGCCTTT ACCGCTAACA GCAAAAAAGA ACTGGATAAC 1440TTTGGCAATG CCAAAAAGTT GGTCTTGGGT TCTACCGTCA TTGATTTGGT GCCTACTGAT 1500GCCACCAAAG ATGTCAATGA ATTCAAAGAA AAGCCAAAGT CTGCCACAAA CAAAGCGGGC 1560GAAACTTTGA TGGTGAATGA TGAAGTTAGC GTCAAAACCT ATGGCAAAAA CTTTGAATAC 1620CTAAAATTTG GTGAGCTTAG TGTCGGTGGT AGCCATAGCG TCTTTTTACA AGGCGAACGC 1680ACCGCTACCA CAGGCGAGAA AGCCGTACCA ACCACAGGCA AAGCCAAATA TTTGGGGAAC 1740TGGGTAGGAT ATATCACAGG AGCGGACTCA TCAAAAGGCT CTACCGATGG CAAAGGCTTT 1800ACCGATGCCA AAGATATTGC TGATTTTGAC ATTGACTTTG AGAAAAAATC AGTTAATGGC 1860AAACTGACCA CCAAAGACCG CCAAGACCCT GTCTTTAACA TCACAGGTGA AATCGCAGGC 1920AATGGCTGGA CAGGTAAAGC CAGCACCGCC GAAGCGAACG CAGGGGGCTA TAAGATAGAT 1980TCTAGCAGTA CAGGCAAATC CATCGTCATC AAAGATGCCG TGGTTACAGG TGGCTTTTAT 2040GGTCCAAATG CAACCGAGAT GGGTGGGTCA TTTACACACA ACAGCGGTAA TGATGGTAAA 2100GTCTCTGTGG TCTTTGGCAC AAAAAAACAA GAAGTTAAGA AGTGA 2145 713 amino acidsamino acid linear 49 Met Lys His Ile Pro Leu Thr Thr Leu Cys Val Ala IleSer Ala Val 1 5 10 15 Leu Leu Thr Ala Cys Gly Gly Ser Gly Gly Ser AsnPro Pro Ala Pro 20 25 30 Thr Pro Ile Pro Asn Ala Ser Gly Ser Gly Asn ThrGly Asn Thr Gly 35 40 45 Asn Ala Gly Gly Thr Asp Asn Thr Ala Asn Ala GlyAsn Thr Gly Gly 50 55 60 Thr Ser Ser Gly Thr Gly Ser Ala Ser Thr Ser GluPro Lys Tyr Gln 65 70 75 80 Asp Val Pro Thr Thr Pro Asn Asn Lys Glu GlnVal Ser Ser Ile Gln 85 90 95 Glu Pro Ala Met Gly Tyr Gly Met Ala Leu SerLys Ile Asn Leu Tyr 100 105 110 Asp Gln Gln Asp Thr Pro Leu Asp Ala LysAsn Ile Ile Thr Leu Asp 115 120 125 Gly Lys Lys Gln Val Ala Asp Asn GlnLys Ser Pro Leu Pro Phe Ser 130 135 140 Leu Asp Val Glu Asn Lys Leu LeuAsp Gly Tyr Ile Ala Lys Met Asn 145 150 155 160 Glu Ala Asp Lys Asn AlaIle Gly Glu Arg Ile Lys Arg Glu Asn Glu 165 170 175 Gln Asn Lys Lys IleSer Asp Glu Glu Leu Ala Lys Lys Ile Lys Glu 180 185 190 Asn Val Arg LysSer Pro Glu Phe Gln Gln Val Leu Ser Ser Ile Lys 195 200 205 Ala Lys ThrPhe His Ser Asn Asp Lys Thr Thr Lys Ala Thr Thr Arg 210 215 220 Asp LeuLys Tyr Val Asp Tyr Gly Tyr Tyr Leu Val Asn Asp Ala Asn 225 230 235 240Tyr Leu Thr Val Lys Thr Asp Lys Pro Lys Leu Trp Asn Ser Gly Pro 245 250255 Val Gly Gly Val Phe Tyr Asn Gly Ser Thr Thr Ala Lys Glu Leu Pro 260265 270 Thr Gln Asp Ala Val Lys Tyr Lys Gly His Trp Asp Phe Met Thr Asp275 280 285 Val Ala Lys Lys Arg Asn Arg Phe Ser Glu Val Lys Glu Thr TyrGln 290 295 300 Ala Gly Trp Trp Tyr Gly Ala Ser Ser Lys Asp Glu Tyr AsnArg Leu 305 310 315 320 Leu Thr Lys Ala Asp Ala Ala Pro Asp Asn Tyr SerGly Glu Tyr Gly 325 330 335 His Ser Ser Glu Phe Thr Val Asn Phe Lys GluLys Lys Leu Thr Gly 340 345 350 Glu Leu Phe Ser Asn Leu Gln Asp Ser HisLys Gln Lys Val Thr Lys 355 360 365 Thr Lys Arg Tyr Asp Ile Lys Ala AspIle His Gly Asn Arg Phe Arg 370 375 380 Gly Ser Ala Thr Ala Thr Asp LysAla Glu Asp Ser Lys Ser Lys His 385 390 395 400 Pro Phe Thr Ser His AlaLys Asp Lys Leu Glu Gly Gly Phe Tyr Gly 405 410 415 Pro Lys Gly Glu GluLeu Ala Gly Lys Phe Leu Thr Asp Asp Asn Lys 420 425 430 Leu Phe Gly ValPhe Gly Ala Lys Gln Glu Gly Asn Val Glu Lys Thr 435 440 445 Glu Ala IleLeu Asp Ala Tyr Ala Leu Gly Thr Phe Asn Lys Pro Gly 450 455 460 Thr ThrAsn Pro Ala Phe Thr Ala Asn Ser Lys Lys Glu Leu Asp Asn 465 470 475 480Phe Gly Asn Ala Lys Lys Leu Val Leu Gly Ser Thr Val Ile Asp Leu 485 490495 Val Pro Thr Asp Ala Thr Lys Asp Val Asn Glu Phe Lys Glu Lys Pro 500505 510 Lys Ser Ala Thr Asn Lys Ala Gly Glu Thr Leu Met Val Asn Asp Glu515 520 525 Val Ser Val Lys Thr Tyr Gly Lys Asn Phe Glu Tyr Leu Lys PheGly 530 535 540 Glu Leu Ser Val Gly Gly Ser His Ser Val Phe Leu Gln GlyGlu Arg 545 550 555 560 Thr Ala Thr Thr Gly Glu Lys Ala Val Pro Thr ThrGly Lys Ala Lys 565 570 575 Tyr Leu Gly Asn Trp Val Gly Tyr Ile Thr GlyAla Asp Ser Ser Lys 580 585 590 Gly Ser Thr Asp Gly Lys Gly Phe Thr AspAla Lys Asp Ile Ala Asp 595 600 605 Phe Asp Ile Asp Phe Glu Lys Lys SerVal Asn Gly Lys Leu Thr Thr 610 615 620 Lys Asp Arg Gln Asp Pro Val PheAsn Ile Thr Gly Glu Ile Ala Gly 625 630 635 640 Asn Gly Trp Thr Gly LysAla Ser Thr Ala Glu Ala Asn Ala Gly Gly 645 650 655 Tyr Lys Ile Asp SerSer Ser Thr Gly Lys Ser Ile Val Ile Lys Asp 660 665 670 Ala Val Val ThrGly Gly Phe Tyr Gly Pro Asn Ala Thr Glu Met Gly 675 680 685 Gly Ser PheThr His Asn Ser Gly Asn Asp Gly Lys Val Ser Val Val 690 695 700 Phe GlyThr Lys Lys Gln Glu Val Lys 705 710 2139 base pairs nucleic acid singlelinear 50 ATGAAACACA TTCCTTTAAC CACACTGTGT GTGGCAATCT CTGCCGTCTTATTAACCGCT 60 TGTGGTGGCA GTGGTGGTTC AAATCCACCT GCTCCTACGC CCATTCCAAATGCAGGCGGT 120 GCAGGTAATG CTGGTAGCGG TACTGGCGGT GCAGGTAGCA CTGATAATGCAGCCAATGCA 180 GGCAGTACAG GCGGTGCAAG CTCTGGTACA GGCAGTGCCA GCACACAAAAACCAAAATAT 240 CAAGATGTGC CAACCGATAA AAATAAAAAA GATGAAGTGT CAGGCATTCAAGAACCTGCC 300 ATGGGTTATG GCGTGGAATT AAAGCTTCGT AACTGGATAC CACAAGAACAGGAAGAACAT 360 GCCAAAATCA ATACAAATGA TGTTGTAAAA CTTGAAGGTG ACTTGAAGCATAATCCATTT 420 GACAACTCTA TTTGGCAAAA CATCAAAAAT AGCAAAGAAG TACAAACTGTTTACAACCAA 480 GAGAAGCAAA ACATTGAAAA TCAAATCAAA AAAGAAAATA AAGAACTTGATAAAACGGCA 540 CTAAAAGCTC TTATTGAAAA AGTTCTTGAT GACTATCTAA CAAGTCTTGCTAAACCCATT 600 TATGAAAAAA ATATTAATGA TTCACATGAT AAGCAGAATA AAGCACGCACTCGTGATTTG 660 AAGTATGTGC GTTCTGGTTA TATTTATCGC TCAGGTTATT CTAATATCGACATTCAAAAG 720 AAAATAGCTA AAACTGGTTT TGATGGTGCT TTATTTTATA AAGGTACACAAACTGCTAAA 780 CAATTGCCTG TATCTGAGGT TAAGTATAAA GGCACTTGGG ATTTTATGACCGATGCCAAA 840 AAAGGACAAT CATTTAGCAG TTTTGAAAGA CGAGCTGGTG ATCGCTATAGTGCAATGTCT 900 TCCCATGAGT ACCCATCTTT ATTAACTGAT GATAAAAACA AACCAGATAATTATAACGAT 960 GAATATGGTC ATAGCAGTGA GTTTACGGTA GATTTTAGTA AAAAGAGCCTAACAGGTGGG 1020 CTGTTTAGTA ACCTACAAGA CCACCATAAG GGCAAGGTTA CGAAAACCAAACGCTATGAC 1080 ATCAATGCCC GTATCCACGG TAACCGCTTC CGTGGCAGTG CCACCGCAATCAATAAAGAT 1140 AATGAAAGCA AAGCCAAACA CCCCTTTACC AGCGATGCCG ACAATAGGCTAGAAGGCGGT 1200 TTTTATGGAC CAAACGCCGA GGAGCTGGCA GGTAAATTCC TAACCGATGACAACAAACTC 1260 TTTGGTGTCT TTGGTGCTAA ACAAGAGAGT GAAGCTAAGG AAACCGAAGCCATCTTAGAT 1320 GCTTATGCAC TTGGGACATT TAATAAATCT GGTACGACCA ATCCTGCCTTTACCGCCAAT 1380 AGTAAAAAAG AACTGGATAA CTTTGGCAAT ATTAATAAAT TGGTCTTGGGTTCTACTGTG 1440 ATAGACCTTA CTCAAGGTAA TGATTTTGTA AAAACCATTG ATAAAGAAAAGCCAGCCACC 1500 ACTACCAATC AAGCAGGCGA GCCTTTGACG GTGAATGATA AGGTTCGGGTACAAGTTTGT 1560 TGTAGCAATC TTGAGCATCT AAAATTTGGC TCACTGAGTA TCGGTGATAGTAATAGCGTC 1620 TTTTTACAAG GTGAACGCAC CGCTACCAAA GGTGATAAAG ATAAAGCCATGCCAGTTGCA 1680 GGAAATGCTA AATACCGTGG TACATGGGCA GGCTATGTTG CAGGCTCTGGCAATACCAGC 1740 AAAGCCTATG AAGCCCAACA ATTTGCTGAC AATGCCAACC GTGCCGAGTTTGATGTAGAC 1800 TTTGCTAACA AAAGCCTAAC TGGTAAGCTT ATTCCAAATA CGAGCAGTGATGGTAAATCT 1860 GCTTTTGATA TTACTGCTAC AATTGATGGC AATGGTTTTA GTGGTAAAGCCAATACACCA 1920 GATATTGAAA CAGGTGGGTT AAAGATTGAC AGTAAGAACA GTGAAAGCGGCCGAGTAATT 1980 GTGAAAGATG CTATAGTTAT AGGTGGCTTT TATGGTCCAC AAGCTAATGAACTGGGTGGC 2040 TCATTTACCT ACAAGAGCAA TGATGCTGGA AATCAAGACA AAGACAGTAGTGCATCTGTG 2100 GTCTTTGGTG CAAGAAAACA ACAAGAAGTC AAACCATGA 2139 712amino acids amino acid single linear 51 Met Lys His Ile Pro Leu Thr ThrLeu Cys Val Ala Ile Ser Ala Val 1 5 10 15 Leu Leu Thr Ala Cys Gly GlySer Gly Gly Ser Asn Pro Pro Ala Pro 20 25 30 Thr Pro Ile Pro Asn Ala GlyGly Ala Gly Asn Ala Gly Ser Gly Thr 35 40 45 Gly Gly Ala Gly Ser Thr AspAsn Ala Ala Asn Ala Gly Ser Thr Gly 50 55 60 Gly Ala Ser Ser Gly Thr GlySer Ala Ser Thr Gln Lys Pro Lys Tyr 65 70 75 80 Gln Asp Val Pro Thr AspLys Asn Lys Lys Asp Glu Val Ser Gly Ile 85 90 95 Gln Glu Pro Ala Met GlyTyr Gly Val Glu Leu Lys Leu Arg Asn Trp 100 105 110 Ile Pro Gln Glu GlnGlu Glu His Ala Lys Ile Asn Thr Asn Asp Val 115 120 125 Val Lys Leu GluGly Asp Leu Lys His Asn Pro Phe Asp Asn Ser Ile 130 135 140 Trp Gln AsnIle Lys Asn Ser Lys Glu Val Gln Thr Val Tyr Asn Gln 145 150 155 160 GluLys Gln Asn Ile Glu Asn Gln Ile Lys Lys Glu Asn Lys Glu Leu 165 170 175Asp Lys Thr Ala Leu Lys Ala Leu Ile Glu Lys Val Leu Asp Asp Tyr 180 185190 Leu Thr Ser Leu Ala Lys Pro Ile Tyr Glu Lys Asn Ile Asn Asp Ser 195200 205 His Asp Lys Gln Asn Lys Ala Arg Thr Arg Asp Leu Lys Tyr Val Arg210 215 220 Ser Gly Tyr Ile Tyr Arg Ser Gly Tyr Ser Asn Ile Asp Ile GlnLys 225 230 235 240 Lys Ile Ala Lys Thr Gly Phe Asp Gly Ala Leu Phe TyrLys Gly Thr 245 250 255 Gln Thr Ala Lys Gln Leu Pro Val Ser Glu Val LysTyr Lys Gly Thr 260 265 270 Trp Asp Phe Met Thr Asp Ala Lys Lys Gly GlnSer Phe Ser Ser Phe 275 280 285 Glu Arg Arg Ala Gly Asp Arg Tyr Ser AlaMet Ser Ser His Glu Tyr 290 295 300 Pro Ser Leu Leu Thr Asp Asp Lys AsnLys Pro Asp Asn Tyr Asn Asp 305 310 315 320 Glu Tyr Gly His Ser Ser GluPhe Thr Val Asp Phe Ser Lys Lys Ser 325 330 335 Leu Thr Gly Gly Leu PheSer Asn Leu Gln Asp His His Lys Gly Lys 340 345 350 Val Thr Lys Thr LysArg Tyr Asp Ile Asn Ala Arg Ile His Gly Asn 355 360 365 Arg Phe Arg GlySer Ala Thr Ala Ile Asn Lys Asp Asn Glu Ser Lys 370 375 380 Ala Lys HisPro Phe Thr Ser Asp Ala Asp Asn Arg Leu Glu Gly Gly 385 390 395 400 PheTyr Gly Pro Asn Ala Glu Glu Leu Ala Gly Lys Phe Leu Thr Asp 405 410 415Asp Asn Lys Leu Phe Gly Val Phe Gly Ala Lys Gln Glu Ser Glu Ala 420 425430 Lys Glu Thr Glu Ala Ile Leu Asp Ala Tyr Ala Leu Gly Thr Phe Asn 435440 445 Lys Ser Gly Thr Thr Asn Pro Ala Phe Thr Ala Asn Ser Lys Lys Glu450 455 460 Leu Asp Asn Phe Gly Asn Ile Asn Lys Leu Val Leu Gly Ser ThrVal 465 470 475 480 Ile Asp Leu Thr Gln Gly Asn Asp Phe Val Lys Thr IleAsp Lys Glu 485 490 495 Lys Pro Ala Thr Thr Thr Asn Gln Ala Gly Glu ProLeu Thr Val Asn 500 505 510 Asp Lys Val Arg Val Gln Val Cys Cys Ser AsnLeu Glu His Leu Lys 515 520 525 Phe Gly Ser Leu Ser Ile Gly Asp Ser AsnSer Val Phe Leu Gln Gly 530 535 540 Glu Arg Thr Ala Thr Lys Gly Asp LysAsp Lys Ala Met Pro Val Ala 545 550 555 560 Gly Asn Ala Lys Tyr Arg GlyThr Trp Ala Gly Tyr Val Ala Gly Ser 565 570 575 Gly Asn Thr Ser Lys AlaTyr Glu Ala Gln Gln Phe Ala Asp Asn Ala 580 585 590 Asn Arg Ala Glu PheAsp Val Asp Phe Ala Asn Lys Ser Leu Thr Gly 595 600 605 Lys Leu Ile ProAsn Thr Ser Ser Asp Gly Lys Ser Ala Phe Asp Ile 610 615 620 Thr Ala ThrIle Asp Gly Asn Gly Phe Ser Gly Lys Ala Asn Thr Pro 625 630 635 640 AspIle Glu Thr Gly Gly Leu Lys Ile Asp Ser Lys Asn Ser Glu Ser 645 650 655Gly Arg Val Ile Val Lys Asp Ala Ile Val Ile Gly Gly Phe Tyr Gly 660 665670 Pro Gln Ala Asn Glu Leu Gly Gly Ser Phe Thr Tyr Lys Ser Asn Asp 675680 685 Ala Gly Asn Gln Asp Lys Asp Ser Ser Ala Ser Val Val Phe Gly Ala690 695 700 Arg Lys Gln Gln Glu Val Lys Pro 705 710 2142 base pairsnucleic acid single linear 52 ATGAAACACA TTCCTTTAAC CACACTGTGTGTGGCAATCT CTGCCGTCTT ATTAACCGCT 60 TGTGGTGGCA GTGGTGGTTC AAATCCACCTGCTCCTACGC CCATCCCAAA TGCAGGCAGT 120 GCAGGTAATG CTGGCGGTAC AGGAAATACAGGCGGTACTG GCAGTACTGA TAATGTAGGC 180 AATGCTGGCG GTGCAAACTC TGGTACAGGCAATGCAGGTA ATTCAGGTAA TGCAAACTCT 240 GGTACAGGCA GTGCCAACAC ACCAGAACCAAAATATCAAG ATGTGCCAAC CGATAAAAAT 300 GAAAAAGAAC AAGTTTCATC CATTCAAGAACCTGCCATGG GTTATGCAAT GGAATTAAAG 360 CTTCGTAATG CTCACCCTCT TAACCCAAATAAAAATAAAG AGGCTGAAAA ACGCATTGCC 420 TTAGACCAAA AAGATTTGGT GGCAGTAGAGGGCGACCTAA CCAACATTCC TTTTGATAAA 480 AATCTTATTG AATACCTTAA AAAATCATCCGAGGTTGTAA GTAAATTTGA AGCACAAAAA 540 GGCGGTATTG AAAATAACAC AAGACTGACACACAAAGATT TATCATCAGA GCAAAAAGAA 600 GCAAAAGTCA AAGAAGCGTT GGACAATGCTTTAACTCAAT TTGCCCAAGA AAAATACAAG 660 GAGCTAATTG AGAACGCCCA TGATAAAAAATCTGACGCAC GCAACCGTGA TCTAGAATAT 720 GTCAAGTCTG GTTTTAACTA TCTTTCTGGATATACCGCCA CCGACCACGA CAAAAAAACC 780 AATTATCGTG GCTATTATGG TGCGTTGTATTATAAAGGCA GCGAAACCGC CAAAGAGCTA 840 CCACAAACAA GTGCAAAATA TAAAGGTTATTGGGACTTTA TGACAGATGC CACACTTGAT 900 AACAAATACA CGGATTTGCC AGGTATCGCCAGACAAACCC AGTGGCGTAG TCTTGTTTCT 960 ACTGATGAGT ATGCAACGCT CTTGACAGACAAAAATAACA AGCCCAGTGA TTACAATGGT 1020 GCATATGGTC ATAGCAGTGA ATTTGATGTTAATTTTGCTG ATAAAAAAAT TAAAGGCAAA 1080 CTTATCAGTA ATCAGTTATC AGGCACAGCTGTAACCGCCA AAGAGCGTTA TAAAATAGAA 1140 GCTGATATCC ACGGCAACCG CTTCCGTGGCAGTGCCACCG CAAGCGATAA AGCAGAAGAC 1200 AGCAAAACCC AACACCCCTT TACCAGCGATGCTACAAACA AGCTAGAAGG TGGTTTTTAT 1260 GGACCAAAAG GCGAGGAGCT GGCAGGTAAATTCTTAACCG ATGACAACAA ACTCTTTGGG 1320 GTCTTTGGTG CTAAACGAGA TAAAGTAGAAAAAACCGAAG CCATCTTAGA TGCCTATGCA 1380 CTTGGGACAT TTAATAATAC AAATAAAGCAACCACATTCA CCCCATTTAC CAAAAAACAA 1440 CTGGATAACT TTGGCAATGC CAAAAAGTTGGTCTTGGGTT CTACCGTCAT TAATTTGGTG 1500 TCTACCGATG CCACCAAAAA TGAATTCACCAAAAAATTCA CCAAAGACAA GCCAACTTCT 1560 GCCACAAACA AAGCGGGCGA GACTTTGATGGTGAATGATG AAGTTATCGT CAAAACCTAT 1620 GGCAAAAACT TTGAATACCT AAAATTTGGTGAGCTTAGTG TCGGTGATAG CCATAGCGTC 1680 TTTTTACAAG GCGAACGCAC CGCTACCACAGGCGAGAAAG CCGTACCAAC CACAGGCAAA 1740 GCCAAATATC TGGGGAACTG GGTAGGATACATCACAGGAG CGGGCACAGG AAAAAGCTTT 1800 AATGAGGCCC AAGATATTGC TGATTTTGACATTGACTTTG AGAGAAAATC AGTTAAAGGC 1860 AAACTGACCA CCCAAGGCCG CACAGATCCTGTCTTTAACA TCAAAGGTGA AATTGCAGGC 1920 AATGGCTGGA CAGGCAAAGC CAGCACCACCAAAGCGGACG CAGGAGGCTA CAAGATAGAT 1980 TCTAGCAGTA CAGGCAAATC CATCGTCATCGAAAATGCCG AAGTTACTGG GGGCTTTTAT 2040 GGTCCAAATG CAAACGAGAT GGGCGGGTCATTTACACACG ATACCGATGA CAGTAAAGCC 2100 TCTGTGGTCT TTGGCACAAA AAGACAACAAGAAGTTAAGT AG 2142 713 amino acids amino acid single linear 53 Met LysHis Ile Pro Leu Thr Thr Leu Cys Val Ala Ile Ser Ala Val 1 5 10 15 LeuLeu Thr Ala Cys Gly Gly Ser Gly Gly Ser Asn Pro Pro Ala Pro 20 25 30 ThrPro Ile Pro Asn Ala Gly Ser Ala Gly Asn Ala Gly Gly Thr Gly 35 40 45 AsnThr Gly Gly Thr Gly Ser Thr Asp Asn Val Gly Asn Ala Gly Gly 50 55 60 AlaAsn Ser Gly Thr Gly Asn Ala Gly Asn Ser Gly Asn Ala Asn Ser 65 70 75 80Gly Thr Gly Ser Ala Asn Thr Pro Glu Pro Lys Tyr Gln Asp Val Pro 85 90 95Thr Asp Lys Asn Glu Lys Glu Gln Val Ser Ser Ile Gln Glu Pro Ala 100 105110 Met Gly Tyr Ala Met Glu Leu Lys Leu Arg Asn Ala His Pro Leu Asn 115120 125 Pro Asn Lys Asn Lys Glu Ala Glu Lys Arg Ile Ala Leu Asp Gln Lys130 135 140 Asp Leu Val Ala Val Glu Gly Asp Leu Thr Asn Ile Pro Phe AspLys 145 150 155 160 Asn Leu Ile Glu Tyr Leu Lys Lys Ser Ser Glu Val ValSer Lys Phe 165 170 175 Glu Ala Gln Lys Gly Gly Ile Glu Asn Asn Thr ArgLeu Thr His Lys 180 185 190 Asp Leu Ser Ser Glu Gln Lys Glu Ala Lys ValLys Glu Ala Leu Asp 195 200 205 Asn Ala Leu Thr Gln Phe Ala Gln Glu LysTyr Lys Glu Leu Ile Glu 210 215 220 Asn Ala His Asp Lys Lys Ser Asp AlaArg Asn Arg Asp Leu Glu Tyr 225 230 235 240 Val Lys Ser Gly Phe Asn TyrLeu Ser Gly Tyr Thr Ala Thr Asp His 245 250 255 Asp Lys Lys Thr Asn TyrArg Gly Tyr Tyr Gly Ala Leu Tyr Tyr Lys 260 265 270 Gly Ser Glu Thr AlaLys Glu Leu Pro Gln Thr Ser Ala Lys Tyr Lys 275 280 285 Gly Tyr Trp AspPhe Met Thr Asp Ala Thr Leu Asp Asn Lys Tyr Thr 290 295 300 Asp Leu ProGly Ile Ala Arg Gln Thr Gln Trp Arg Ser Leu Val Ser 305 310 315 320 ThrAsp Glu Tyr Ala Thr Leu Leu Thr Asp Lys Asn Asn Lys Pro Ser 325 330 335Asp Tyr Asn Gly Ala Tyr Gly His Ser Ser Glu Phe Asp Val Asn Phe 340 345350 Ala Asp Lys Lys Ile Lys Gly Lys Leu Ile Ser Asn Gln Leu Ser Gly 355360 365 Thr Ala Val Thr Ala Lys Glu Arg Tyr Lys Ile Glu Ala Asp Ile His370 375 380 Gly Asn Arg Phe Arg Gly Ser Ala Thr Ala Ser Asp Lys Ala GluAsp 385 390 395 400 Ser Lys Thr Gln His Pro Phe Thr Ser Asp Ala Thr AsnLys Leu Glu 405 410 415 Gly Gly Phe Tyr Gly Pro Lys Gly Glu Glu Leu AlaGly Lys Phe Leu 420 425 430 Thr Asp Asp Asn Lys Leu Phe Gly Val Phe GlyAla Lys Arg Asp Lys 435 440 445 Val Glu Lys Thr Glu Ala Ile Leu Asp AlaTyr Ala Leu Gly Thr Phe 450 455 460 Asn Asn Thr Asn Lys Ala Thr Thr PheThr Pro Phe Thr Lys Lys Gln 465 470 475 480 Leu Asp Asn Phe Gly Asn AlaLys Lys Leu Val Leu Gly Ser Thr Val 485 490 495 Ile Asn Leu Val Ser ThrAsp Ala Thr Lys Asn Glu Phe Thr Lys Lys 500 505 510 Phe Thr Lys Asp LysPro Thr Ser Ala Thr Asn Lys Ala Gly Glu Thr 515 520 525 Leu Met Val AsnAsp Glu Val Ile Val Lys Thr Tyr Gly Lys Asn Phe 530 535 540 Glu Tyr LeuLys Phe Gly Glu Leu Ser Val Gly Asp Ser His Ser Val 545 550 555 560 PheLeu Gln Gly Glu Arg Thr Ala Thr Thr Gly Glu Lys Ala Val Pro 565 570 575Thr Thr Gly Lys Ala Lys Tyr Leu Gly Asn Trp Val Gly Tyr Ile Thr 580 585590 Gly Ala Gly Thr Gly Lys Ser Phe Asn Glu Ala Gln Asp Ile Ala Asp 595600 605 Phe Asp Ile Asp Phe Glu Arg Lys Ser Val Lys Gly Lys Leu Thr Thr610 615 620 Gln Gly Arg Thr Asp Pro Val Phe Asn Ile Lys Gly Glu Ile AlaGly 625 630 635 640 Asn Gly Trp Thr Gly Lys Ala Ser Thr Thr Lys Ala AspAla Gly Gly 645 650 655 Tyr Lys Ile Asp Ser Ser Ser Thr Gly Lys Ser IleVal Ile Glu Asn 660 665 670 Ala Glu Val Thr Gly Gly Phe Tyr Gly Pro AsnAla Asn Glu Met Gly 675 680 685 Gly Ser Phe Thr His Asp Thr Asp Asp SerLys Ala Ser Val Val Phe 690 695 700 Gly Thr Lys Arg Gln Gln Glu Val Lys705 710 8266 base pairs nucleic acid single linear 54 GATGCCTGCCTTGTGATTGG TTGGGGTGTA TCGGTGTATC AAAGTGCAAA AGCCAACAGG 60 TGGTCATTGATGAATCAATC AAAACAAAAC AACAAATCCA AAAAATCCAA ACAAGTATTA 120 AAACTTAGTGCCTTGTCTTT GGGTCTGCTT AACATCACGC AGGTGGCACT GGCAAACACA 180 ACGGCCGATAAGGCGGAGGC AACAGATAAG ACAAACCTTG TTGTTGTCTT GGATGAAACT 240 GTTGTAACAGCGAAGAAAAA CGCCCGTAAA GCCAACGAAG TTACAGGGCT TGGTAAGGTG 300 GTCAAAACTGCCGAGACCAT CAATAAAGAA CAAGTGCTAA ACATTCGAGA CTTAACACGC 360 TATGACCCTGGCATTGCTGT GGTTGAGCAA GGTCGTGGGG CAAGCTCAGG CTATTCTATT 420 CGTGGTATGGATAAAAATCG TGTGGCGGTA TTGGTTGATG GCATCAATCA AGCCCAGCAC 480 TATGCCCTACAAGGCCCTGT GGCAGGCAAA AATTATGCCG CAGGTGGGGC AATCAACGAA 540 ATAGAATACGAAAATGTCCG CTCCGTTGAG ATTAGTAAAG GTGCAAATTC AAGTGAATAC 600 GGCTCTGGGGCATTATCTGG CTCTGTGGCA TTTGTTACCA AAACCGCCGA TGACATCATC 660 AAAGATGGTAAAGATTGGGG CGTGCAGACC AAAACCGCCT ATGCCAGTAA AAATAACGCA 720 TGGGTTAATTCTGTGGCAGC AGCAGGCAAG GCAGGTTCTT TTAGCGGTCT TATCATCTAC 780 ACCGACCGCCGTGGTCAAGA ATACAAGGCA CATGATGATG CCTATCAGGG TAGCCAAAGT 840 TTTGATAGAGCGGTGGCAAC CACTGACCCA AATAACCGAA CATTTTTAAT AGCAAATGAA 900 TGTGCCAATGGTAATTATGA GGCGTGTGCT GCTGGCGGTC AAACCAAACT TCAAGCCAAG 960 CCAACCAATGTGCGTGATAA GGTCAATGTC AAAGATTATA CAGGTCCTAA CCGCCTTATC 1020 CCAAACCCACTCACCCAAGA CAGCAAATCC TTACTGCTTC GCCCAGGTTA TCAGCTAAAC 1080 GATAAGCACTATGTCGGTGG TGTGTATGAA ATCACCAAAC AAAACTACGC CATGCAAGAT 1140 AAAACCGTGCCTGCTTATCT GGCGGTTCAT GACATTGAAA AATCAAGGCT CAGCAACCAT 1200 GCCCAAGCCAATGGCTATTA TCAAGGCAAT AATCTTGGTG AACGCATTCG TGATACCATT 1260 GGGCCAGATTCAGGTTATGG CATCAACTAT GCTCATGGCG TATTTTATGA TGAAAAACAC 1320 CAAAAAGACCGCCTAGGGCT TGAATATGTT TATGACAGCA AAGGTGAAAA TAAATGGTTT 1380 GATGATGTGCGTGTGTCTTA TGATAAGCAA GACATTACGC TACGCAGCCA GCTGACCAAC 1440 ACGCACTGTTCAACCTATCC GCACATTGAC AAAAATTGTA CGCCTGATGT CAATAAACCT 1500 TTTTCGGTAAAAGAGGTGGA TAACAATGCC TACAAAGAAC AGCACAATTT AATCAAAGCC 1560 GTCTTTAACAAAAAAATGGC GTTGGGCAGT ACGCATCATC ACATCAACCT GCAAGTTGGC 1620 TATGATAAATTCAATTCAAG CCTGAGCCGT GTAGAATATC GTTTGGCAAC CCATCAGTCT 1680 TATCAAAAACTTGATTACAC CCCACCAAGT AACCCTTTGC CAGATAAGTT TAAGCCCATT 1740 TTAGGTTCAAACAACAAACC CATTTGCCTT GATGCTTATG GTTATGGTCA TGACCATCCA 1800 CAGGCTTGTAACGCCAAAAA CAGCACTTAT CAAAATTTTG CCATCAAAAA AGGCATAGAG 1860 CAATACAACCAAAAAACCAA TACCGATAAG ATTGATTATC AAGCCATCAT TGACCAATAT 1920 GATAAACAAAACCCCAACAG CACCCTAAAA CCCTTTGAGA AAATCAAACA AAGTTTGGGG 1980 CAAGAAAAATACAACAAGAT AGACGAACTT GGCTTTAAAG CTTATAAAGA TTTACGCAAC 2040 GAATGGGCGGGTTGGACTAA TGACAACAGC CAACAAAATG CCAATAAAGG CACGGATAAT 2100 ATCTATCAGCCAAATCAAGC AACTGTGGTC AAAGATGACA AATGTAAATA TAGCGAGACC 2160 AACAGCTATGCTGATTGCTC AACCACTCGC CACATCAGTG GTGATAATTA TTTCATCGCT 2220 TTAAAAGACAACATGACCAT CAATAAATAT GTTGATTTGG GGCTGGGTGC TCGCTATGAC 2280 AGAATCAAACACAAATCTGA TGTGCCTTTG GTAGACAACA GTGCCAGCAA CCAGCTGTCT 2340 TGGAATTTTGGCGTGGTCGT CAAGCCCACC AATTGGCTGG ACATCGCTTA TAGAAGCTCG 2400 CAAGGCTTTCGCATGCCAAG TTTTTCTGAA ATGTATGGCG AACGCTTTGG CGTAACCATC 2460 GGTAAAGGCACGCAACATGG CTGTAAGGGT CTTTATTACA TTTGTCAGCA GACTGTCCAT 2520 CAAACCAAGCTAAAACCTGA AAAATCCTTT AACCAAGAAA TCGGAGCGAC TTTACATAAC 2580 CACTTAGGCAGTCTTGAGGT TAGTTATTTT AAAAATCGCT ATACCGATTT GATTGTTGGT 2640 AAAAGTGAAGAGATTAGAAC CCTAACCCAA GGTGATAATG CAGGCAAACA GCGTGGTAAA 2700 GGTGATTTGGGCTTTCATAA TGGACAAGAT GCTGATTTGA CAGGAATTAA CATTCTTGGC 2760 AGACTTGACCTAAACGCTGC CAATAGTCGC CTTCCCTATG GATTATACTC AACACTGGCT 2820 TATAACAAAGTTGATGTTAA AGGAAAAACC TTAAACCCAA CTTTGGCAGG AACAAACATA 2880 CTGTTTGATGCCATCCAGCC ATCTCGTTAT GTGGTGGGGC TTGGCTATGA TGCCCCAAGC 2940 CAAAAATGGGGAGCAAACGC CATATTTACC CATTCTGATG CCAAAAATCC AAGCGAGCTT 3000 TTGGCAGATAAGAACTTAGG TAATGGCAAC ATTCAAACAA AACAAGCCAC CAAAGCAAAA 3060 TCCACGCCGTGGCAAACACT TGATTTGTCA GGTTATGTAA ACATAAAAGA TAATTTTACC 3120 TTGCGTGCTGGCGTGTACAA TGTATTTAAT ACCTATTACA CCACTTGGGA GGCTTTACGC 3180 CAAACAGCAAAAGGGGCGGT CAATCAGCAT ACAGGACTGA GCCAAGATAA GCATTATGGT 3240 CGCTATGCCGCTCCTGGACG CAATTACCAA TTGGCACTTG AAATGAAGTT TTAACCAGTG 3300 GCTTTGATGTGATTTTGGCA TGCCAAATCC CAATCAACCA ATGAATAAAG CCCCCATTAC 3360 CATGAGGGCTTTATTTTATC ATCGCTGAGT ATGCTCTTAG CGGTCATCAC TCAGATTAGT 3420 CATTAATTTATTAGCGATTA ATTTATTAGT AATCACGCTG CTCTTTGATG ATTTTAAGTG 3480 ATGGGTATTCAAGAACGATG TCATACTCAG CACCGTTTTT ATAGGCTTCT ACTTCAAAGA 3540 CAGGCTTGCCTAAAAAGTCA TCAACTTCTA TATCGCCGAC TTGATAGCCA CGAGCAGCAA 3600 GCATTTGAATGGCTTTTTGA CGATTTTGGG CAAAGTTGCT GTCGCCATAA GCTTGTGCTT 3660 TAATACGGTCGTTAGCAACT GCGGTGGTAG AGATACCAAC GGCAGGCAAC AAAACAGCAG 3720 CACTTAGTACGCCAGCCAAC AGTTTATTGG TTAAATTTTT CATAGTAGTT TCCTAATTAT 3780 TATCATTGTAATTCATGTTT ATCGTTATAA ACAATCGTTA TAAATAACTG TGTCGTGATA 3840 ACCATTAATCACAAGTGGGT TAAATGCCTT TTGCCCAATG GCAAATAGGC ACAATGCTCT 3900 GCTTGTTCTATGATGGTCTA TTATGATCAT CATTTTATTG ACCTATTTTT TTAATCGTAA 3960 TGTTTGTTTGATGTTAGTAT AAATTTTATC AATCAAACAA TCACAAATTA TATCAATCAT 4020 AGACGGTAAACAGGCTTCAT ATTTTACGCA TATTTCCCCA GATGTCTGTA GTGTTTCATA 4080 GATGATTTGTAAAACAATTG TCGGTCATTA TTATCAATTG TAAACTGATG GCTAATTTGT 4140 AACCTTATGGCTAATGATAA TATGAATAAA GCGTTATACT GTATCAAAGA ATGAGTAAAA 4200 ACCATCAATGGTATCTTATT TATCATCAGG TTGTGTTAAT AAGATGCCAA TTAAGCGACT 4260 AATTTTGTAAATTAATTAAT AATCATTCAT ATTTGTATTT TTAAATACCA TAAAAAATGG 4320 TAAAATATGCTCGCTTTTTT GATAGGAGCT GTCATGACAA TCACGCCTGT TTATACCACA 4380 TTCACCCCCACCAAAACACC CATAAAATTT TTTATGGCTG GCTTGACTTT TCTAATCGCT 4440 CATATCAGCCATGCCGATGA TGGTCGCACC GACAATCAAG AGCTAATCAA TCAAGAAATA 4500 GCCACCCTTGAACCCATCAT TAACCATGCT CAGCCTGAGT TATTGTCCCA TGATGCATTA 4560 ACACCAAAAATAGAACCAAT ACTGGCACAA ACACCAAATC CTGCCGAAGA TACGCTCATC 4620 GCCGATGAGGCGTTACTGCT TGATAACCCT GATTTGCTCA ATCACGCCCT AAATTCTGCT 4680 GTCATGACCAATCATATGGC AGGCGTTCAC GCATTATTGC CCATTTATCA AAAACTGCCC 4740 AAAGACCATCAAAATGGCAT TTTACTTGGG TATGCCAATG CCTTGGCTGC TTTGGATAAG 4800 GGCAACGCCAAAAAAGCCAT TGATGAGCTA CGTCGCATCA TCGCCATCAT GCCTGAATAT 4860 AATGTGGTGCGTTTTCATCT GGCAAGGGCA TTATTTATGG ACAAACAAAA TGAAGCCGCC 4920 CTTGACCAGTTTAATAAATT ACATGCTGAC AACTTGCCAG AGGAGGTGCG GCAGGTTGTT 4980 GGGCAGTACAGACAAGCGCT AAAACAACGA GATTCATGGA CATGGCAAGT AGGCATGAAT 5040 CTGGCCAAAGAAGACAACAT CAATCAAACC CCCAAAAACA CCACGCAAGG TCAATGGACT 5100 TTTGACAAACCCATTGACGC CATCACCCTA AGCTACCAAT TGGGGGCGGA TAAAAAGTGG 5160 TCTTTGCCCAAAGGGGCATA TGTGGGAGCG AACGCCCAAA TCTATGGCAA ACATCATCAA 5220 AATCACAAAAAATACAACGA CCATTGGGGC AGACTGGGGG CAAATTTGGG CTTTGCTGAT 5280 GCCAAAAAAGACCTTAGCAT TGAGACCTAT GGTGAAAAAA GATTTTATGG GCATGAGCGT 5340 TATACCGACACCATTGGCAT ACGCATGTCG GTTGATTATA GAATCAACCC AAAATTTCAA 5400 AGCCTAAACGCCATAGACAT ATCACGCCTA ACCAACCATC GGACGCCTAG GGCTGACAGT 5460 AATAACACTTTATACAGTAC CTCATTGATT TATTACCCAA ATGCCACACG CTATTATCTT 5520 TTGGGGGCAGACTTTTATGA TGAAAAAGTG CCACAAGACC CATCTGACAG TTATCAACGC 5580 CGTGGCATACGCACAGCGTG GGGGCAAGAA TGGGCGGGTG GTCTTTCAAG CCGTGCCCAA 5640 ATCAGCATCAACAAACGCCA TTACCAAGGG GCAAACCTAA CCAGCGGTGG ACAAATTCGC 5700 CATGATAAACAGATGCAAGC GTCTTTATCG CTTTGGCACA GAGACATTCA CAAATGGGGC 5760 ATCACGCCACGGCTGACCAT CAGCACAAAC ATCAATAAAA GCAATGACAT CAAGGCAAAT 5820 TATCACAAAAATCAAATGTT TGTTGAGTTT AGTCGCATTT TTTGATGGGA TAAGCACGCC 5880 CTACTTTTGTTTTTGTAAAA AAATGTGCCA TCATAGACAA TATCAAGAAA AAATCAAGAA 5940 AAAAAGATTACAAATTTAAT GATAATTGTT ATTGTTTATG TTATTATTTA TCAATGTAAA 6000 TTTGCCGTATTTTGTCTATC ATAAATGCAT TTATCAAATG CTCAAATAAA TACGCCAAAT 6060 GCACATTGTCAGCATGCCAA AATAGGCATC AACAGACTTT TTTAGATAAT ACCATCAACC 6120 CATCAGAGGATTATTTTATG AAACACATTC CTTTAACCAC ACTGTGTGTG GCAATCTCTG 6180 CCGTCTTATTAACCGCTTGT GGTGGCAGTG GTGGTTCAAA TCCACCTGCT CCTACGCCCA 6240 TTCCAAATGCTAGCGGTTCA GGTAATACTG GCAACACTGG TAATGCTGGC GGTACTGATA 6300 ATACAGCCAATGCAGGTAAT ACAGGCGGTA CAAACTCTGG TACAGGCAGT GCCAACACAC 6360 CAGAGCCAAAATATCAAGAT GTACCAACTG AGAAAAATGA AAAAGATAAA GTTTCATCCA 6420 TTCAAGAACCTGCCATGGGT TATGGCATGG CTTTGAGTAA AATTAATCTA CACAACCGAC 6480 AAGACACGCCATTAGATGAA AAAAATATCA TTACCTTAGA CGGTAAAAAA CAAGTTGCAG 6540 AAGGTAAAAAATCGCCATTG CCATTTTCGT TAGATGTAGA AAATAAATTG CTTGATGGCT 6600 ATATAGCAAAAATGAATGTA GCGGATAAAA ATGCCATTGG TGACAGAATT AAGAAAGGTA 6660 ATAAAGAAATCTCCGATGAA GAACTTGCCA AACAAATCAA AGAAGCTGTG CGTAAAAGCC 6720 ATGAGTTTCAGCAAGTATTA TCATCACTGG AAAACAAAAT TTTTCATTCA AATGACGGAA 6780 CAACCAAAGCAACCACACGA GATTTAAAAT ATGTTGATTA TGGTTACTAC TTGGCGAATG 6840 ATGGCAATTATCTAACCGTC AAAACAGACA AACTTTGGAA TTTAGGCCCT GTGGGTGGTG 6900 TGTTTTATAATGGCACAACG ACCGCCAAAG AGTTGCCCAC ACAAGATGCG GTCAAATATA 6960 AAGGACATTGGGACTTTATG ACCGATGTTG CCAACAGAAG AAACCGATTT AGCGAAGTGA 7020 AAGAAAACTCTCAAGCAGGC TGGTATTATG GAGCATCTTC AAAAGATGAA TACAACCGCT 7080 TATTAACTAAAGAAGACTCT GCCCCTGATG GTCATAGCGG TGAATATGGC CATAGCAGTG 7140 AGTTTACTGTTAATTTTAAG GAAAAAAAAT TAACAGGTAA GCTGTTTAGT AACCTACAAG 7200 ACCGCCATAAGGGCAATGTT ACAAAAACCG AACGCTATGA CATCGATGCC AATATCCACG 7260 GCAACCGCTTCCGTGGCAGT GCCACCGCAA GCAATAAAAA TGACACAAGC AAACACCCCT 7320 TTACCAGTGATGCCAACAAT AGGCTAGAAG GTGGTTTTTA TGGGCCAAAA GGCGAGGAGC 7380 TGGCAGGTAAATTCTTAACC AATGACAACA AACTCTTTGG CGTCTTTGGT GCTAAACGAG 7440 AGAGTAAAGCTGAGGAAAAA ACCGAAGCCA TCTTAGATGC CTATGCACTT GGGACATTTA 7500 ATACAAGTAACGCAACCACA TTCACCCCAT TTACCGAAAA ACAACTGGAT AACTTTGGCA 7560 ATGCCAAAAAATTGGTCTTA GGTTCTACCG TCATTGATTT GGTGCCTACT GATGCCACCA 7620 AAAATGAATTCACCAAAGAC AAGCCAGAGT CTGCCACAAA CGAAGCGGGC GAGACTTTGA 7680 TGGTGAATGATGAAGTTAGC GTCAAAACCT ATGGCAAAAA CTTTGAATAC CTAAAATTTG 7740 GTGAGCTTAGTATCGGTGGT AGCCATAGCG TCTTTTTACA AGGCGAACGC ACCGCTACCA 7800 CAGGCGAGAAAGCCGTACCA ACCACAGGCA CAGCCAAATA TTTGGGGAAC TGGGTAGGAT 7860 ACATCACAGGAAAGGACACA GGAACGGGCA CAGGAAAAAG CTTTACCGAT GCCCAAGATG 7920 TTGCTGATTTTGACATTGAT TTTGGAAATA AATCAGTCAG CGGTAAACTT ATCACCAAAG 7980 GCCGCCAAGACCCTGTATTT AGCATCACAG GTCAAATCGC AGGCAATGGC TGGACAGGGA 8040 CAGCCAGCACCACCAAAGCG GACGCAGGAG GCTACAAGAT AGATTCTAGC AGTACAGGCA 8100 AATCCATCGCCATCAAAGAT GCCAATGTTA CAGGGGGCTT TTATGGTCCA AATGCAAACG 8160 AGATGGGCGGGTCATTTACA CACAACGCCG ATGACAGCAA AGCCTCTGTG GTCTTTGGCA 8220 CAAAAAGACAACAAGAAGTT AAGTAGTAAT TTAAACACAA TGTTTG 8266 1539 base pairs nucleicacid single linear 55 ATGCTCGCTT TTTTGATAGG AGCTGTCATG ACAATCACGCCTGTTTATAC CACATTCACC 60 CCCACCAAAA CACCCATAAA ATTTTTTATG GCTGGCTTGACTTTTCTAAT CGCTCATATC 120 AGCCATGCCG ATGATGGTCG CACCGACAAT CAAGAGCTAATCAATCAAGA AATAGCCACC 180 CTTGAACCCA TCATTAACCA TGCTCAGCCT GAGTTATTGTCCCATGATGC ATTAACACCA 240 AAAATAGAAC CAATACTGGC ACAAACACCA AATCCTGCCGAAGATACGCT CATCGCCGAT 300 GAGGCGTTAC TGCTTGATAA CCCTGATTTG CTCAATCACGCCCTAAATTC TGCTGTCATG 360 ACCAATCATA TGGCAGGCGT TCACGCATTA TTGCCCATTTATCAAAAACT GCCCAAAGAC 420 CATCAAAATG GCATTTTACT TGGGTATGCC AATGCCTTGGCTGCTTTGGA TAAGGGCAAC 480 GCCAAAAAAG CCATTGATGA GCTACGTCGC ATCATCGCCATCATGCCTGA ATATAATGTG 540 GTGCGTTTTC ATCTGGCAAG GGCATTATTT ATGGACAAACAAAATGAAGC CGCCCTTGAC 600 CAGTTTAATA AATTACATGC TGACAACTTG CCAGAGGAGGTGCGGCAGGT TGTTGGGCAG 660 TACAGACAAG CGCTAAAACA ACGAGATTCA TGGACATGGCAAGTAGGCAT GAATCTGGCC 720 AAAGAAGACA ACATCAATCA AACCCCCAAA AACACCACGCAAGGTCAATG GACTTTTGAC 780 AAACCCATTG ACGCCATCAC CCTAAGCTAC CAATTGGGGGCGGATAAAAA GTGGTCTTTG 840 CCCAAAGGGG CATATGTGGG AGCGAACGCC CAAATCTATGGCAAACATCA TCAAAATCAC 900 AAAAAATACA ACGACCATTG GGGCAGACTG GGGGCAAATTTGGGCTTTGC TGATGCCAAA 960 AAAGACCTTA GCATTGAGAC CTATGGTGAA AAAAGATTTTATGGGCATGA GCGTTATACC 1020 GACACCATTG GCATACGCAT GTCGGTTGAT TATAGAATCAACCCAAAATT TCAAAGCCTA 1080 AACGCCATAG ACATATCACG CCTAACCAAC CATCGGACGCCTAGGGCTGA CAGTAATAAC 1140 ACTTTATACA GTACCTCATT GATTTATTAC CCAAATGCCACACGCTATTA TCTTTTGGGG 1200 GCAGACTTTT ATGATGAAAA AGTGCCACAA GACCCATCTGACAGTTATCA ACGCCGTGGC 1260 ATACGCACAG CGTGGGGGCA AGAATGGGCG GGTGGTCTTTCAAGCCGTGC CCAAATCAGC 1320 ATCAACAAAC GCCATTACCA AGGGGCAAAC CTAACCAGCGGTGGACAAAT TCGCCATGAT 1380 AAACAGATGC AAGCGTCTTT ATCGCTTTGG CACAGAGACATTCACAAATG GGGCATCACG 1440 CCACGGCTGA CCATCAGCAC AAACATCAAT AAAAGCAATGACATCAAGGC AAATTATCAC 1500 AAAAATCAAA TGTTTGTTGA GTTTAGTCGC ATTTTTTGA1539 512 amino acids amino acid single linear 56 Met Leu Ala Phe Leu IleGly Ala Val Met Thr Ile Thr Pro Val Tyr 1 5 10 15 Thr Thr Phe Thr ProThr Lys Thr Pro Ile Lys Phe Phe Met Ala Gly 20 25 30 Leu Thr Phe Leu IleAla His Ile Ser His Ala Asp Asp Gly Arg Thr 35 40 45 Asp Asn Gln Glu LeuIle Asn Gln Glu Ile Ala Thr Leu Glu Pro Ile 50 55 60 Ile Asn His Ala GlnPro Glu Leu Leu Ser His Asp Ala Leu Thr Pro 65 70 75 80 Lys Ile Glu ProIle Leu Ala Gln Thr Pro Asn Pro Ala Glu Asp Thr 85 90 95 Leu Ile Ala AspGlu Ala Leu Leu Leu Asp Asn Pro Asp Leu Leu Asn 100 105 110 His Ala LeuAsn Ser Ala Val Met Thr Asn His Met Ala Gly Val His 115 120 125 Ala LeuLeu Pro Ile Tyr Gln Lys Leu Pro Lys Asp His Gln Asn Gly 130 135 140 IleLeu Leu Gly Tyr Ala Asn Ala Leu Ala Ala Leu Asp Lys Gly Asn 145 150 155160 Ala Lys Lys Ala Ile Asp Glu Leu Arg Arg Ile Ile Ala Ile Met Pro 165170 175 Glu Tyr Asn Val Val Arg Phe His Leu Ala Arg Ala Leu Phe Met Asp180 185 190 Lys Gln Asn Glu Ala Ala Leu Asp Gln Phe Asn Lys Leu His AlaAsp 195 200 205 Asn Leu Pro Glu Glu Val Arg Gln Val Val Gly Gln Tyr ArgGln Ala 210 215 220 Leu Lys Gln Arg Asp Ser Trp Thr Trp Gln Val Gly MetAsn Leu Ala 225 230 235 240 Lys Glu Asp Asn Ile Asn Gln Thr Pro Lys AsnThr Thr Gln Gly Gln 245 250 255 Trp Thr Phe Asp Lys Pro Ile Asp Ala IleThr Leu Ser Tyr Gln Leu 260 265 270 Gly Ala Asp Lys Lys Trp Ser Leu ProLys Gly Ala Tyr Val Gly Ala 275 280 285 Asn Ala Gln Ile Tyr Gly Lys HisHis Gln Asn His Lys Lys Tyr Asn 290 295 300 Asp His Trp Gly Arg Leu GlyAla Asn Leu Gly Phe Ala Asp Ala Lys 305 310 315 320 Lys Asp Leu Ser IleGlu Thr Tyr Gly Glu Lys Arg Phe Tyr Gly His 325 330 335 Glu Arg Tyr ThrAsp Thr Ile Gly Ile Arg Met Ser Val Asp Tyr Arg 340 345 350 Ile Asn ProLys Phe Gln Ser Leu Asn Ala Ile Asp Ile Ser Arg Leu 355 360 365 Thr AsnHis Arg Thr Pro Arg Ala Asp Ser Asn Asn Thr Leu Tyr Ser 370 375 380 ThrSer Leu Ile Tyr Tyr Pro Asn Ala Thr Arg Tyr Tyr Leu Leu Gly 385 390 395400 Ala Asp Phe Tyr Asp Glu Lys Val Pro Gln Asp Pro Ser Asp Ser Tyr 405410 415 Gln Arg Arg Gly Ile Arg Thr Ala Trp Gly Gln Glu Trp Ala Gly Gly420 425 430 Leu Ser Ser Arg Ala Gln Ile Ser Ile Asn Lys Arg His Tyr GlnGly 435 440 445 Ala Asn Leu Thr Ser Gly Gly Gln Ile Arg His Asp Lys GlnMet Gln 450 455 460 Ala Ser Leu Ser Leu Trp His Arg Asp Ile His Lys TrpGly Ile Thr 465 470 475 480 Pro Arg Leu Thr Ile Ser Thr Asn Ile Asn LysSer Asn Asp Ile Lys 485 490 495 Ala Asn Tyr His Lys Asn Gln Met Phe ValGlu Phe Ser Arg Ile Phe 500 505 510 512 amino acids amino acid singlelinear 57 Met Leu Ala Phe Leu Ile Gly Ala Val Met Thr Ile Thr Pro ValTyr 1 5 10 15 Thr Thr Phe Thr Pro Thr Lys Thr Pro Ile Lys Phe Phe MetAla Gly 20 25 30 Leu Thr Phe Leu Ile Ala His Ile Ser His Ala Asp Asp GlyArg Thr 35 40 45 Asp Asn Gln Glu Pro Ile Asn Gln Glu Ile Ala Thr Leu GluPro Ile 50 55 60 Ile Asn His Ala Gln Pro Glu Leu Leu Ser His Gly Ala LeuThr Pro 65 70 75 80 Lys Thr Glu Pro Ile Leu Ala Gln Thr Pro Asn Pro AlaGlu Asp Thr 85 90 95 Leu Ile Ala Asp Glu Ala Leu Leu Leu Asp Asn Pro AspLeu Leu Asn 100 105 110 His Ala Leu Asn Ser Ala Val Met Thr Asn Asn MetAla Gly Val His 115 120 125 Ala Leu Leu Pro Ile Tyr Gln Lys Leu Pro LysAsp His Gln Asn Gly 130 135 140 Ile Leu Leu Gly Tyr Ala Asn Ala Leu ValAla Leu Asp Lys Gly Asn 145 150 155 160 Ala Lys Ala Ala Ile Gly Glu LeuArg Arg Ile Ile Ala Ile Met Pro 165 170 175 Glu Tyr Asn Val Val Arg PheHis Leu Ala Arg Ala Leu Phe Met Asp 180 185 190 Lys Gln Asn Glu Ala AlaLeu Asp Gln Phe Asn Lys Leu His Ala Asp 195 200 205 Asn Leu Pro Glu GluVal Arg Arg Val Val Gly Gln Tyr Arg Gln Ala 210 215 220 Leu Lys Gln ArgAsp Ser Trp Thr Trp Gln Val Gly Met Asn Leu Ala 225 230 235 240 Lys GluAsp Asn Ile Asn Gln Thr Pro Lys Asn Thr Thr Gln Gly Gln 245 250 255 TrpThr Phe Asp Lys Pro Ile Asp Ala Ile Thr Leu Ser Tyr Gln Leu 260 265 270Gly Ala Asp Lys Lys Trp Ser Leu Pro Lys Gly Ala Tyr Val Gly Ala 275 280285 Asn Ala Gln Ile Tyr Gly Lys His His Gln Asn His Lys Lys Tyr Asn 290295 300 Asp His Trp Gly Arg Leu Gly Ala Asn Leu Gly Phe Ala Asp Ala Lys305 310 315 320 Lys Asp Leu Ser Ile Glu Thr Tyr Gly Glu Lys Arg Phe TyrGly His 325 330 335 Glu Arg Tyr Thr Asp Thr Ile Gly Ile Arg Met Ser AlaAsp Tyr Arg 340 345 350 Ile Asn Pro Lys Phe Gln Ser Leu Asn Ala Ile AspIle Ser Arg Leu 355 360 365 Thr Asn His Arg Thr Pro Arg Ala Asp Ser AsnAsn Thr Leu Tyr Ser 370 375 380 Thr Ser Leu Ile Tyr Tyr Pro Asn Ala ThrArg Tyr Tyr Leu Leu Gly 385 390 395 400 Ala Asp Phe Tyr Asp Glu Lys ValPro Gln Asp Pro Ser Asp Ser Tyr 405 410 415 Glu Arg Arg Gly Ile Arg ThrAla Trp Gly Gln Glu Trp Ala Gly Gly 420 425 430 Leu Ser Ser Arg Ala GlnIle Ser Ile Asn Lys Arg His Tyr Gln Gly 435 440 445 Ala Asn Leu Thr SerGly Gly Gln Ile Arg Gln Asp Lys Gln Met Gln 450 455 460 Ala Ser Leu SerLeu Trp His Arg Asp Ile His Lys Trp Gly Ile Thr 465 470 475 480 Pro ArgLeu Thr Ile Ser Thr Asn Ile Asn Lys Ser Asn Asp Ile Lys 485 490 495 AlaAsn Tyr His Lys Asn Gln Met Phe Val Glu Phe Ser Arg Ile Phe 500 505 51023 base pairs nucleic acid single linear 58 GATGGGATAA GCACGCCCTA CTT 2325 base pairs nucleic acid single linear 59 CCCATCAGCC AAACAAACAT TGTGT25 7 amino acids amino acid single linear 60 Leu Glu Gly Gly Phe Tyr Gly1 5

What we claim is:
 1. A purified and isolated nucleic acid moleculeconsisting of a DNA sequence selected from the group consisting of: (a)a DNA sequence as set out in FIG. 5, 6, 10, 11, 27, 31, 32 or 33 (SEQ IDNOS: 1, 2, 3, 4, 5, 6, 7, 8, 45, 47, 48, 50 or 52) or the complementaryDNA sequence thereto; and (b) a DNA sequence encoding an amino acidsequence as set out in FIG. 5, 6, 10, 11, 27, 31, 32 or 33 (SEQ ID NOS:9, 10, 11, 12, 13, 14, 15, 16, 46, 49, 51 or 53) or the complementaryDNA sequence thereto.
 2. A purified and isolated nucleic acid moleculeconsisting of a DNA sequence selected from the group consisting of: (a)a DNA sequence as set forth in FIGS. 27, 31, 32 or 33 (SEQ ID Nos: 45,47, 48, 50 or 52) or the complementary DNA sequence thereto; and (b) aDNA sequence encoding an amino acid sequence as set forth in FIGS. 27,31, 32 or 33 (SEQ ID Nos: 46, 49, 51 or 53) or the complementarysequence thereto.
 3. A purified and isolated nucleic acid moleculeconsisting of a DNA sequence possessing a restriction map selected fromthe group consisting of: nucleotides 1 to 3225 of M. catarrhalis strain4223 tbpA gene as set forth in FIG. 3, nucleotides 1 to 2106 of M.catarrhalis strain 4223 tbpB gene as set forth in FIG. 4, nucleotides 1to 3660 of M. catarthalis strain Q8 tbpA gene as set forth in FIG. 8,nucleotides 1 to 3487 of M. catarrhalis strain Q8 tbpB gene as set forthin FIG. 9, nucleotides 1 to 2121 of M. catarrhalis strain M35 tbpB geneas set forth in FIG. 26, nucleotides 1 to 2145 of M. catarrhalis strainR1 tbpB gene as set forth in FIG. 28, nucleotides 1 to 2129 of M.catarrhalis strain 3 tbpB gene as set forth in FIG. 29, and nucleotides1 to 2142 of M. catarrhalis strain LES1 tbpB gene as set forth in FIG.30.
 4. A purified and isolated nucleic acid molecule encoding afunctional transferrin receptor protein of a strain of Moraxellacatarrhalis consisting of a DNA sequence which has at least about 90%sequence identity to any one of the DNA sequences selected from thegroup consisting of: (a) a DNA sequence as set forth in FIG. 5, 6, 10,11, 27, 31, 32 or 33 (SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 48, 50, 51, 53or 55); or (b) a DNA sequence encoding an amino acid sequence as setforth in FIG. 5, 6, 10, 11, 27, 31, 32 or 33 (SEQ ID NOS: 9, 10, 11, 12,13, 14, 15, 16, 49, 52, 54 or 56).
 5. A vector adapted fortransformation of a host comprising the nucleic acid molecule of claim 1or
 2. 6. A vector adapted for transformation of a host comprising thenucleic acid molecule of claim 1 and further comprising expression meansoperatively coupled to the nucleic acid molecule for expression by thehost of a transferrin receptor protein of a strain of Moraxella encodedby one of said DNA sequences.
 7. The vector of claim 6 which is of aplasmid selected from the group consisting of pLEM-37 having ATCCDeposit No. 97,834, SLRD35-A having ATCC Deposit No. 97,833 and SLRD35-Bshown in FIG. 20B.
 8. A transformed host containing an expression vectoras claimed in claim
 6. 9. A method of forming a substantially purerecombinant transferrin receptor protein of a strain of Moraxella, whichcomprises: growing the transformed host of claim 8 to express atransferrin receptor protein as inclusion bodies, purifying theinclusion bodies free from cellular material and soluble proteins,solubilizing transferrin receptor protein from the purified inclusionbodies, and purifying the transferrin receptor protein free from othersolubilized materials.
 10. The method of claim 9 wherein saidtransferrin receptor protein comprises Tbp1 alone, Tbp2 alone or amixture of Tbp1 and Tbp2.
 11. The method of claim 10 wherein saidtransferrin receptor protein is at least about 70% pure.
 12. The methodof claim 11 wherein said transferrin receptor protein is at least about90% pure.
 13. A diagnostic kit for determining the presence, in asample, of nucleic acid encoding a transferrin receptor protein of astrain of Moraxella, comprising: (a) the nucleic acid molecule of claim1; (b) means for contacting the nucleic acid molecule with the sample toproduce duplexes comprising the nucleic acid molecule and any saidnucleic acid present in the sample and hybridizable with the nucleicacid molecule; and (c) means for determining production of the duplexes.